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Stability of Frequency Feedback Receivers 
Under Steps in Input Frequency 


By V. E. BENES 
(Manuscript received January 19, 1971) 


It 1s known that frequency feedback demodulators can show in- 
stability in their response. to step changes (mistuning) in input fre- 
quency. This work reports on some mathematical analyses of this 
phenomenon as described by differential equations arising from simple 
IF and feedback filters in the demodulator. These equations are studied 
for local and global stability by geometric or phase-plane analysis, by 
means of Lyapunov functions, and by the topological Poincaré-Ben- 
dixson methods. A typical result is for the case of no feedback filter 
and one-pole baseband analog of the IF filter, and states in physical 
terms that if the mistuning is not too big, specifically if 


| mistuning | < (half-power IF bandwidth) (1 + feedback gain) 
then solutions which are bounded away from zero amplitude approach 
the natural equilibrium point. Examples are given in which a suf- 
ficiently large mistuning makes the equilibrium point unstable. 


I. INTRODUCTION 


The frequency feedback (or frequency compression) demodulator 
for FM signals was proposed by J. G. Chaffeet in 1937. After some 
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twenty-five years, Chaffee’s idea was found to have a particularly 
fitting application in the satellite communications experiments Echo? 
and Telstar,’ in which there was a high premium on detecting a low- 
power wide-band FM signal in noise. Nevertheless, since its invention, 
little progress has been made in the mathematical analysis of this cir- 
cuit. Approximate methods of analysis and synthesis have been pro- 
posed, and some of them experimentally verified as useful.4° However, 
except for unpublished works by 8. O. Rice and T. R. Williams, the 
nonlinear character of the circuit away from equilibrium positions has 
not been considered. 

It is the aim of this paper to formulate briefly one of the problems 
arising in the analysis of the FM with feedback (FMFB) receiver, 
namely that of stability of its response to step changes in input fre- 
quency. We shall write equations describing this response and present 
results about local and global stability of solutions for simple cases. 


II. CIRCUIT DESCRIPTION 


The FMFB receiver has been extensively discussed in recent. publi- 
cations,*° so only a brief description of it is included here. Roughly 
speaking, the receiver is a conventional FM demodulator, with a local 
oscillator whose frequency is controlled linearly by the output of the 
detector. The object of this control is to reduce the index of modulation 
at the output of the mixer, so as to be able to use a narrower IF filter 
than in a conventional FM receiver, and thus to eliminate some of the 
noise accompanying the input signal. The action of the circuit is to 
follow the slowly varying frequency of an FM wave while looking at 
it through a moving narrow frequency “window.” 

The circuit is closely related to the phase-locked oscillator, but it is 
distinguished from that device by having amplitude effects absent in 
the latter. Mathematically this distinction takes the form that in 
FMEB there is an amplitude variable for every phase variable, while 
in phase-lock these variables do not appear. Their presence critically 
affects and complicates circuit analysis: thus the simplest FMFB 
equation is in two dimensions, while the simplest phase-lock equation 
is the pendulum equation, in one. The FMFB receiver resembles the 
phase-locked oscillator in that both devices work by phase-locking 
onto an FM wave that varies slowly over a limited range; if this range 
is exceeded locking fails and oscillation can set in. This phenomenon 
is well-known in phase-locked oscillators; in FMFB receivers a similar 
behavior has been described by L. H. Enloe.* It is to this stability prob- 
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lem that we address ourselves, endeavoring an analytical study of the 
stability of simple differential equations describing the mistuning of 
the incoming signal away from the normal carrier frequency. 

A typical result we prove states that if the mistuning wg is not too 
big, specifically, in physical terms, if (for a one-pole baseband analog 
of the IF filter) 


|wu| < (half-power IF bandwidth) (1 + feedback gain), 


then, for the simplest receiver, solutions which are bounded away from 
zero amplitude approach the equilibrium or critical point. This and 
similar results are proved by using the Poincaré-Bendixson theory, or 
with the help of Lyapunov functions. 


III. EQUATIONS FOR RECEIVER WITH IDEAL DETECTOR 


We shall write equations for the FMFB receiver (see Fig. 1) under 
the assumption that it contains an ideal frequency detector. That is, 
we assume that if the signal leaving the IF filter is a(¢) cos (wf + 6()), 
then the detector produces the output 6(¢). Let the mixer input be 


XL, COS wt — x, SIN wl, 
and let the mixer multiply this input by 
2 cos (wat + Be(t)) 


where 8 (in practice and here > 0) is the feedback gain, and ¢ is the 
feedback signal. It is assumed that the IF filter is tuned to the difference 
frequency w = w; — w,, and can be represented by an impulse response 


ESSENTIALLY U, COS wt-Ugs SIN ot _ PERFECT DETECTOR 


@W=W,-Wo 





N 


<-7 


\ ! 
IF FILTER DISCRIMINATOR 












dc AND BASEBAND 
AMPLIFIER-FILTER 








Fig. 1—FMFB receiver, block diagram. 
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of the form 2f(£) cos wt, with f(-) a baseband response such that f(t) = 0 
for t < 0. The sum (w, + w.) components of the mixer are essentially 
removed by the IF filter, and will be ignored. The difference (w, — we) 
components at the output of the mixer are 

cos wi{z, cos Be + 2, sin Be} — sin wt{x, cos By — Zz, sin By}. 


The response of the IF filter to these components has the form 


COS wt [ f(t — w){x.(u) cos Bo(u) + x,(u) sin Be(u)} du 


— sin wl : f(t — u){x,(u) cos Be(u) — x,(u) sin Be(u)} du 


+ terms around 2w 
+ terms representing initial conditions. 


We shall assume that the passband of f(-) 1s small compared to 2o, 
so that the components around 2u may be ignored as well. 

To complete the loop equations we must indicate how the feedback 
¢(-) is determined from the output of the IF filter. We set, for t > 0 


ut) = ret) + [HE = whee) 008 Bele) + (0) sin Bele} du, 


ud) = rl) + [HE = west) cos Bele) — ela) sin Bele} du, 


where 7r,(-), 7s(-) represent the effects of initial conditions in the filter 
at ¢ = 0. Exclusive of the carrier, the angle modulation of the IF 
output, 
U, COS wh — U, SiN wh, 
is just 
6 = tan? “! , 
corresponding to an instantaneous frequency, 


; Uclls we Use 
Ga Nee ee 
a 


where a = (u2 + u2)’”’. This is the output of the ideal detector. 
The feedback frequency ¢(-) controlling the voltage-controlled 
oscillator is obtained by filtering 6(-). Thus 
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g(t) = r(t) + a k(t — u)6(u) du, i> 0 


where k(-) is the impulse response of the feedback filter, and r(-) 
represents the effect of initial conditions at ¢ = 0. 


IV. DIFFERENTIAL EQUATION FOR THE SIMPLEST CASE 


When the baseband responses f(-) and k(-) correspond to filters 
with rational transfer functions, the integral equations for u,(-) and 
us(-) can be turned into differential equations in a well-known way. 
In the simplest case, when there is no feedback filter and f(-) corre- 
sponds to a (one-pole no-zero) filter with transfer »/(A + s), we ob- 
tain the equations 


Ue = —AU, + plz, cos BO + x, sin 86] 
tu, = —AuU, + plz, cos BO — x, sin 86] 


UAe, COS BA — x, Sin BA) — u(x, cos BO + x, sin BA) 
Ue + Us 
The introduction of polar coordinates u,. = a cos 6, Us = a sin 6 simpli- 
fies these equations to 


6 = 


2. 
Il 


© (x, cos (6 + 1)8 — z,sin G6 + 1)8) (1) 


I 


ad = —da + p(x, cos (8 + 1)6 + 2, sin (6 + 18). 
We first consider the stability of equations (1) when the input to the 
demodulator consists of the carrier cos ot alone, with no signal. In 


this case x, = 1, x, = 0, and the equations are 
6= — sin (8+ 1e 
a (2) 
a@= —r\a + pcos (64+ 1)8. 


We recall that the critical points of a differential equation ¢ = v(x) are 
the points x in the phase-space at which v(x) = 0. Those of the system (2) 





are then the points in the a, @ plane at which simultaneously 6 = ad = 0, 
namely, 
a= = G= ii nm an integer 
ak S ) — B + 1 ) ger. 


Because of the periodic dependence of the right-hand side of (2) on 6, 
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it is possible and convenient to define £ = (8 + 1)6, to write (2) as 


ad = —da + pcos, 


and to consider only principal values of ¢, and thus only the critical 
point (u/d, 0) in the plane specified by the polar coordinates (a, £). 


Theorem 1: The equations (3) are globally asymptotically stable for 
all positive \ and p; all solutions tend to the critical point p/dr, 0 in an 
exponential manner; ¢ 1s monotone, and a is either monotone or has one 


Proof: We start with an heuristic direct analysis of the trajectories. 
Consider in Fig. 2 the circle C in the a, ¢ plane defined by a = 0, that is, 
a = p/d cos ¢. With x = a cos f, y = asin ¢ we shall examine the 
directions of the trajectories of (3) at points on C’. The equation of C is 


i 
pole): 


Since C' is the locus d = 0, it is apparent that on C each trajectory 
(has a tangent that) is perpendicular to the radius from the origin. 


Y 










VECTOR FIELD DEFINED 
BY EQUATIONS (3) 
/ 


CRITICAL POINT 
(u/A,0) IN 
(a, f) PLANE 


. LL 
C OR 42=0 OR a=cost 


r 





| 
| 
| 
| 
| 
| 


Fig. 2—Phase plane for no mistuning. 
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By symmetry about the x-axis, we can restrict attention to y 2 0. The 
slope of C at xz, y is 


ay _ 1 (ue _ "(2 - y-i¢- 

dx 2\y r 7) oy \x an) 

while the slope of the line through zx, y perpendicular to the radius 
from the origin is 





~ Is 


Since for 0 < x < p/), 


—afe-9) cof 2) 


we find 





Thus every trajectory is entering C on d = 0 except at the origin and 
at the critical point. ¢ is decreasing in ¢ > 0. If a trajectory ever crosses 
C it can never again recross it and must approach the critical point; 
in this case a has a single minimum. If a trajectory never crosses C, 
it must simply slip into the critical point, because then both a and ¢ 
decrease and are bounded below. 

These preliminaries lead us to define the Lyapunov function 


=i (1 — a cos s) + Lig sin ¢)” 
2\ 9 
= ; (distance from a, ¢ to critical point)* 


Evidently V = 0, and V = 0 only at w/d, 0. The rate of change of V 
along trajectories of (3) is 


7 ~ .b Bgs.: 
V = aa dycosf+ayesin¢g 
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) oB+1 
— Hp 


-r(a — cose ; sin’ ¢ 


v 


27- 
= —av — HE sin? ¢ <0 


except at the critical point, where V = V = 0. It follows from Theorem II, 
p. 37 of Ref. 6, that the system (3) is globally asymptotically stable: 
all solutions tend exponentially to the critical point with reciprocal time 
constant 2A. When \ = uy, 2d has the physical interpretation 


2\ = 2 X (half-power IF bandwidth). 
V. MISTUNING IN THE SIMPLEST CASE 
Let us assume that in equation (1) we have 


Y, = Sin wal, Le = COS wat, 


corresponding to the “mistuned” carrier input cos(w; + g)t, or to the 
constant modulating signal wg. The equation (1) assumes the form 


6 = “sin (ot — @ + 18) 


ad = —dra + p cos wt — (6+ 19), 
or with £ = wat — (8 + 1)8, 


i = Wa — WB ED sin 


: (4) 
ad = —ha + p Cos ¢. 


The critical point of this system is determined by the conditions 


-1 





a= i cos ¢, ¢ = tan xe + ) (5) 

It is important to note that because of the possibility of going to low 
amplitudes there always exist: critical points, regardless of the value of 
wg. This situation is in sharp contrast with the phase-locked oscillator. 
For a filterless phase-locked oscillator the equation corresponding to 
(4) would be 


f = Wa — b sin g, 
which has no critical point if wg > yw. Thus in phase-lock there is 
usually a critical frequency deviation above which locking is impossi- 
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ble for lack of critical points, and below which it may or may not occur. 
In the FMFB receiver, though, the critical points always exist but, as 
we shall see later, they are not always stable. 

We determine the stability of the critical point (5) by the standard 
method of linearization. The matrix 


Oo, 0, . BED) uw(B +1). 

ar ¢ SA c _ saa C08 c 2 sin e 
0 : 

x4 5a 2 —psin £ —v 


of partial derivatives, evaluated at the critical point, is the matrix A 
appropriate for the linearized system. The determinant of (sJ-A) turns 
out to be 


wa 
BaP 1 * 
with roots in the left half-plane. Hence the critical point is stable; in 
a neighborhood of it the trajectories approach it. 

Because of the symmetry of the equations, there is no loss of gen- 
erality in assuming, as we do henceforth, that w, < 0. This convention 
is used in Figs. 3 and 4. 

Although we have not proved it, it is natural (and we conjecture) 
that a separatrix lies between solutions which pass around the origin 
in the upper half of the a, ¢ plane, and solutions which just miss the 
origin as they go past it in the lower half of the plane. Roughly speak- 
ing, the former pick up an extra 22 of phase before settling. This 
separatrix may even be a fan’ of solutions each of which goes into the 
origin, although in all likelihood it consists of a single trajectory. This 
conclusion is supported by a heuristic low-amplitude analysis of equa- 
tions (4) suggested by J. A. Morrison. He writes (4) as the single 
equation 





s+tsa(6+2)+VN6+) + 


dg _ aw, — w(8 + 1) sing 


da pa cos £ — da” 


’ 


and for |cos ¢| & 1 drops terms of order a’, obtaining 


pa, cos ¢ = dw, — p(B + 1) sin tf = wa © sin g, 


whence by integration from ay to a 


sin ¢ — A = (22) (sin bo — ato). 
u(6+ 2) \a © HG + 2) 
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CRITICAL 





Fig. 3—Phase plane for mistuning. 


This formula suggests that if a point (ao, f), on the circle 


je) Ae 
Wa 

and near the origin, is on a trajectory then a nearby point on the circle 

is on the same trajectory. In other words, a trajectory going through 

the origin does so like the circle above, which is tangent to but outside 

the circle § = 0 with equation 


g — UBD 
Wa 


sin ¢. 


To avoid difficulties we shall consider only trajectories which are 
bounded away from the origin. 
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We now address ourselves to the global stability of the mistuned 
equations (4). Since the system is two dimensional it is possible to try 
to use the topological Poincaré-Bendixson methods.’ This is most con- 
veniently done by passing to rectangular coordinates x = a cos , 
y = asin ¢ again, and calculating the divergence. We rewrite (4) as 





2 2 
i ee ae a ete oes 
Fi : MOT ea ge PP CA) as Way 
y _y,,  _pBay 
AY ee at y + wet 
= v(x, y) 
We find 
Oa g 
a A — 2yB G@ +a 
ay _ Seay 
ay = nN up (x? as yy? 
tle Ge cs a oO 
divv = —2r e+e 
_ HB 
= —2nr - cos ¢ 





Fig. 4—Details of the curve K. 
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Lemma 1: In the portion of the fourth quadrant comprised by an 
arbitrary neighborhood of the origin there ts always a curve K, joining the 
nositive x-axis to the negative y-axis, such that on K trajectories of (4) 
cross K in the outward direction, 1.e., out of the part cut off by K near the 
origin, and into the part separated by K from the origin. (See Fig. 3.) 


Proof: Let K consist of the circle a = ¢ from the x-axis down to the 


point where this circle crosses d = 0, i.e, until cos ¢ = eA/p. The 
Cartesian coordinates of this point are 
he € a Oe 
t=, Yo fm eS V = Fae 
L Lt 


From here let K continue to the y-axis at slope 1, i.e., let 1t consist of 
that portion of the line 


_ 2 
pepe tA oee See 
Me b 


which is between its intercept 7) — x on the y-axis and the circle d = 0. 
Now on a = e inside d = 0 we have a@ > 0, so on the circular part 
of K all trajectories are entering a > e, even at 2%, Yo. At %, Yo the 
trajectory is actually tangent to a = e, but for small enough e« it is 
pointed in the direction of increasing x and so there too it must enter 
a> é 
On the linear part of K we want to verify that 





xv 
ry + wat — 1B hs 


2 
B= ——______2 44 <1 
AX +B — way + HB Fay 


if € is small enough. This is true because on the linear part of K we have 
x —0,y— 0, and 


i 
cos ¢ ay — 0 

all monotonely and uniformly, as e — 0; near e = 0 the denominator 
of dy/dz is close to u(8 + 1) for (x, y) € K, so on the linear part of K the 
trajectories are moving in the direction of increasing x at a slope 
dy/dx < 1; hence they are crossing K in the direction of increasing 
amplitude. This proves the lemma. 


Theorem 2: If | wa| < (8 + 1), then every trajectory of (4) that is 
bounded away from the origin approaches the critical point a = w/) cos £, 


¢ = tan” w./dM(8 + 1). 
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Proof: It suffices to consider only w, < 0. All trajectories outside 
a = u/d have ad < 0, so it is enough to consider those starting inside, 
because the others get there eventually. Consider a path starting in 
0S ¢ S 87/2, § S 0,a S p/), and bounded away from the origin. 
Either it stays in this region forever, or it reaches the fourth quadrant, 
or else it moves into ¢ > O. If it stays there forever then there is a 
closed region, free of critical points and excluding the origin, in which 
it stavs. By a result of Poincaré (Ref. 7, p. 232), this closed region also 
contains a closed path y, of period say 7. Then because y is closed, 


if ¢(0) is on y, 


t(7) — ¢(0) = I " (t) dt = 0. 


But this is impossible since ¢ < 0 throughout the region in question. 

In the third quadrant a trajectory can cross § = 0 only once. The 
argument just given also shows that no path bounded away from the 
origin can stay in the region t S$ ¢ S$ 37/2, > 0,a S p/). Thus all 
paths starting in the region a S u/\ and bounded away from the origin 
reach the fourth quadrant. 

The inequality in the hypothesis implies that the circle § = 0 inter- 
sects the circle a = »/d. Away from the origin we have €§ < 0 for x = 0, 
y > Oandé> O0forzx =0,y < 0.0n0 < « S b/d, y = O the tra- 
jectories enter the fourth quadrant intersected with a S u/). Since a is 
nonincreasing on a = p/), it follows from Lemma 1 that there is a 
region R with these properties: 


Gi) R is closed. 
(ii) RC {a S w/d} CK fourth quadrant. 
(iii) ({a S w/d} CO’ fourth quadrant — R) is in an arbitrarily small 
neighborhood of the origin. 
(iv) R is entered by the path under consideration. 
(v) # is invariant, i.e. maps into itself under the motion. 


Indeed R can be chosen to be a 2-cell (homeomorph of a disc). We 
note that the divergence is negative throughout R. It follows from the 
criterlum of Bendixson (Ref. 7, p. 238) that R contains no limit cycles 
nor even an oval going to and from a critical point. Since R&R contains 
only one critical point, it can contain no path-polygon. Thus two of the 
three alternatives in Bendixson’s theorem (Ref. 7, p. 230) are ruled 
out, and all paths starting in R go to the critical point. Since we can 
associate a region like R with any trajectory bounded away from the 
origin, the theorem is proved. 
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We remark that when ’ = p» the condition of the theorem can be 
rendered in physical terms as 


| mistuning | S$ (half-power IF bandwidth) (1 -++ feedback gain). 


We next show that a result similar to Theorem 2 can be obtained by 
a Lyapunov function argument. 


Theorem 8: If 


Qn 1)? 
| wa | a 6 a ’ 
B 
then every trajectory of (4) that is bounded away from the origin 
approaches the critical point a = p/d, 6 = tan w@/A(B + 1). 


Proof: Consider the sealar function V defined by 
2V = (da — w cos $) + (8 + 1)*(aw, — u(8 + 1) sin $)” 
a + (6+ 1) Cah)’. 


We find 


V = —A(\a — p cos ¢)’ 


pipe Oe — 4 08 s)(awy — u(B + 1) sin $) 





a 

~*~ (aw, — (8 + 1) sin 9 
are 

—V is a quadratic form in a and af with determinant 


poe i ie 


2(8 + 1)” 





r 


— Bwa 


r 
264) 4d 


which is positive whenever | w, | < 2\(8 + 1)*””/8. 

Consider now a trajectory bounded away from the origin. It is 
clearly bounded, so it has a positive limiting set IT” which is invariant 
and to which it tends. There is a constant k such that the trajectory is 
entirely contained in a bounded subregion Q of {V < k}. Hence l* CQ 
and V = 0 on I. Since a is bounded away from 0 on the trajectory, 
it follows that d = 0 and ¢ = 0 on I’. Thus the trajectory tends to the 
critical point. (This is a variant of the argument for Theorem VI, p. 58 
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of Ref. 6.) Again, the condition of the theorem is that | w,| not be 
too big, viz., 


| w, | < 2(half-power IF bandwidth) X (@ + 1)*”/8, 
where @ is the feedback gain. 


VI. ONE-POLE FILTER IN THE FEEDBACK 


After the filterless case considered so far, the next simplest model 
for the FMFB receiver would have one-pole, no-zero filters both as the 
baseband equivalent f(-) of the IF response, and as the response 
k(-) in the feedback. This is the simplest case that has appreciable 
practical import: the IF filter corresponds closely to the one-mesh 
design described by Giger and Chaffee (loc. cit., p. 1119 and Fig. 5, 
p. 1120) ; the feedback filter and the gain 8 are a rudimentary version 
of the de-and-baseband amplifier sketched by these authors (loc. cit. 
pp. 1121-22.) 

In this case the differential equations for the system are 


U, = —hu. + u(x, cos Be + 2, Sin By) 
Us = “Nits + u(x. cos Be — XL, sin Be) 
ee ee 6) Gta) Us 
6=a (Uh, — Ust,) = il tan . 
£= —yx + 66 

g=2 


with a = (v2 + u®)' as before, and 6/(y + s) the transfer function of 
the feedback filter. Upon setting & = 6 + By these simplify to 


£= 6x + (a, cos — — 2x, sin £) 


ad = —da + u(x, cost + x, sin £) 


£= —yx + e (x, cos& — x, sin £). 


When the modulating signal is a constant og, then z(t) = sin wat, 
%-(t) = cos wat, and with £(t) = wat — €(t) the equations become 


f = «, — Bo — Tsing 
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a 


—da + pw cos ¢ (6) 


= —yx + *# sin Gs 

Let us note heuristically and physically that if 8 = y ~ © then the 
feedback bandwidth goes to o and we obtain the equations (4) of the 
simplest case. 

We start with a study of the stability of the critical point & , ao , 2 
defined by 


fo = tan" Sa eee 
(1 + 62) 
Y 
Ay = - cos fo (7) 
Be es gO, 
° vy + B65 


The matrix A = (0f;/dx;) of partial derivatives evaluated at the criti- 
cal point is 





¢ a xe 
E d’ tan fo 
s =e COS £5 B 
a| —psin £5 —r 0|- 
‘ én” tan £o 
z AN i pearer Co Y 


The determinant of (sI-A) is 


(s + A)((s + AN(s + y) + BOA) + ENB tan’ [> + WX tan’ f(s + 7) 
= s° + (2\ + y)s° + (Ay + BoA +)’ tan’ fo)s 
+ 07(86 + tan’ (86 + )) 
=stas tasta. 


A necessary and sufficient condition for stability is that 


@,,42,% >0 and aa, > a. 


The first three are clearly true, and the last amounts to 


2 Yea : 2 os : 
(2X + y)(2dyv -+ XY + ABS) + (Be 5) > 786 + rare 
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This is symmetric in £wg , and is true for |wg| small enough. It becomes 
false for large Jwg| if 24 < 68. If \ = » and y = 8, then these numbers 
are the half-power bandwidths of the IF and feedback filter respec- 
tively, and we may say in physical terms that if 


Re fedbae eee bs IF bandwidth _ 2d 
Tela emeee fesdback bandwidth. 8 


then a very large mistuning cannot affect the local stability of the sys- 
tem, but if 8 exceeds twice the ratio of IF to baseband widths then 
sufficiently large mistuning will make the system unstable. This result 
was first observed in an unpublished work (although with some errors) 
of T. R. Williams. 

The global stability of the equations (6) for the case with a one- 
pole in the feedback is a far more difficult topic than the local. Natur- 
ally, as more complicated filters are assumed for the IF and the feed- 
back, the dimension of the problem goes up, and the kind of geometric 
analysis we are using here becomes virtually impossible. In particular, 
the Poincaré-Bendixson theory used earlier is already unavailable in 
three dimensions, and also there seems to be no ready way to prove 
the boundedness of solutions. Nevertheless some information can be 
obtained from the construction of a Lyapunov function for the case of 
no mistuning; all attempts to extend the method to the case of mistun- 
ing have failed. 


Theorem 4: If wa = 0 (no mistuning) then every trajectory of (6) 
that 1s bounded away from the hne a = 0 approaches the critical point 
given by (7). 


Proof: Consider the scalar function V defined by 


a ae ye Lu cos” ¢ 
—— re a TI Pain? + 

eh Wey oe ee eas 
V is certainly positive along any trajectory satisfying the hypothesis. 
We find after a lot of elementary calculus that 


Vea a (A+ y)a"(g)”_ 

A+ Y + 68 
Since the trajectory assumed in the theorem is bounded away from 
the line a = 0, it is easy to see from the equations (6) that it is bounded. 
Thus the positive limiting set of this trajectory is a nonempty, compact 
invariant set I'*, to which it tends. There is a constant & such that the 


2V 
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trajectory is eventually in a bounded subregion 2 of {[V < k}. Thus 
r* Cand V = Oon I”. Consider now the largest invariant subset M of 
{V = 0} (\Q. Clearly [* C M. Thus the trajectory tends to M. On M, 
V = 0; hence since the trajectory is bounded away from a = 0, we see 
also that d = 0 and ¢ = 0 on M. Now the equations d = 0, ¢ = 0 


define a spiral curve C on the cylinder \a = yu cos ¢ by the formula 
d 
= —7 tan ¢, 
y B £ 


and M is an invariant subregion of this curve, bounded away from 
a = 0 (which C is not). On C the vector field defined by the equations 
must either vanish, or else must point in the +y direction, or else point 
in the —y direction; this is because there can be no motion in the a, 
¢ plane on ad = 0, ¢ = O. If either of the second two alternatives holds 
at a point of C, that point cannot belong to M/, because the trajectory 
through it would move off C and M C C. Hence UV consists of C-points 
at which the field vanishes, i.e., JJ = {critical point}. (Cf. Theorem VI, 
p. 58 of Ref. 6). 

Try as we might (and we tried many Vs) we have not succeeded in 
proving a version of Theorem 4 in which there was mistuning. If the 
same V is used with w, ~ 0 as was used in Theorem 4, then it is no longer 
true that V < 0; thus the results we feel are there still elude proof. 


VII. COMPENSATED ATTENUATOR IN FEEDBACK 


In private communication, L. H. Enloe and B. R. Davis have sug- 
gested that probably the most important practical FMFB receiver is 
one having a single-pole (as the baseband equivalent of the) IF filter, 
and a single-pole—zero-feedback filter, i.e., one with transfer 


sta 
stb (8) 


In the time domain this acts like a delta-function plus an exponential. 
From formula (6) we see that the right-hand side of the equation for ¢ 
can be thought of as the output of a filter whose response is a delta- 
function plus an exponential. This suggests that the analysis in Section 6 
can be made to cover the filter transfer (8) as well as the one-pole, 
because the differential equations are such as naturally to supply the 
constant in (8) if it is not present. 

Writing the transfer function as 

sta 


Cc 
sb edb 
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with c = a — Bb, the differential equations for the system become 


Ue oa —)u, + u(x, cos Be + Ls sin Be) 
wu, = —du, + w(x, cos Be — x, sin Be) 
aes) ee Bee one -1 Us 
6=a (uu, — Ut.) = a tan 7 

g=O6+y 
y = — by ae C6, 


with a = (uw? + u?)'” again. We note that 


j= = (t, cos (Be + 0) — x, sin (Be + 6); 


now we set € = Be + 6 and simplify the equation to 


—= By + B+ De Ge, cos é — x, sin £) 
ad = —dra + p(x, cosé + 2, 31M £) 
y = —by oC, cos & — x, sin &). 


With the modulating signal a constant wa, we have as for equatioa 
(6), v(t) = sin wat, v(t) = cos wat, and we can set €(t) = wat — E(t) 
to obtain the equations 


: l)usi 

f =o. — by — EF Aesme 

d = —da + p cos¢ (9) 
y= —by + “Bsin ¢. 


These equations have the same form as (6) except that c replaces 6, 
b replaces y, and there is an extra (8 + 1) coefficient in the sin term of ¢. 
The critical point is at 


a = x C08 fo 


=1 Wa 


dB + n(1 i <8) 





£6 = tan 


y 


° 


= @+ tant. 
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The matrix of partial derivatives evaluated at the critical point is 


c a y 
: d’ tan tan [o 
{-@+D @+)> Ee ~s 
a) —psin £5 —r Q|- 
26ST. eta == = 


Bb COS fo 
The appropriate polynomial is 
s+ s'(A(B + 1) + b+) + s(AdG@ + 2) 
+ °(8 + 1)(1 + tan’ &) + cdB(6 + 1) 
+ b(B + 1)d’ tan® f + cd’B(B + 1) + c(B + DB tan’ f 
= 8 + as’ + ass + a. 


A necessary and sufficient condition for local stability is then that 
dy, A, , and ay all > O, and that ava; > ay. It is clear that the first 
three conditions are met whenever c > 0, 1.e.,a > 6. The case a = b 
is degenerate and reduces in dimension to the filterless case of Section 
4, Also, az is always positive. However, since 


tan G = Ses 
A(B + n(1 + 8) 
we can write a, as 
a, =Ab+XHN(B+ 1) +AGB+ 1) +8) + (b athe: ty 
If 
(6 — 1 sty) > d+ 6a 


then the sum of the first three terms is negative, and a, < 0 for |wg| 
sufficiently small. Similarly 


_ loa | (6 + 68) _ 


(142) @4 1 


AM = cn’ B(B+ 1) + 
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which is negative for any og if b + cB < 0, and is also negative for 
Joa] sufficiently small if c < 0. The case c < 0 is physically a bit 
strange, because it is equivalent to having positive feedback in one 
loop of the feedback path; thus it is not surprising that in this case 
there can be instability even for wg = 0. 

In the case c > 0 only the condition asa, > ay is of concern and this 
is 

(AB + 24 + B)AB(B + 2) + N°(B + ICL + tan® >) + cdAB(B + 1) 

> (b + Bc)(8B + 1)d tan’ & + cr’B(B + J), 


which simplifies somewhat to 


N7b(8’ + 78 + 9) 
+ (6 + 1)6 + 2) + cdB(B + 1)(b + AB + A) + AVG F 2) 


| wa |” 





> ; (N78 + 1gbe — 4B + 16 + 2). 


@+ (+2) 
Again, this is symmetric in +o, , and is true for |og| small enough. It 
becomes false for |wg| large if 


Be > X(6 + 2). 


We can think of the feedback path in this example as consisting of 
two parallel branches whose sum is added, one being an amplifier with 
gain, 1, the other being a single-pole filter S with de gain 1 and (half- 
power) bandwidth 6, in series with an amplifier A of gain c/b. We 
can then say in physical terms, assuming A = gp, that if 


IF bandwidth 
S bandwidth 


then even a very large mistuning cannot affect local system stability, 
but if not, then a sufficiently large mistuning can. The fact that 
(8 + 2) appears as a factor on the right shows how much the zero in 
the compensating attenuator helps prevent instability, in agreement 
with what has been observed in practice by L. H. Enloe and B. R. 
Davis (private communication). 

The global stability of the equations (9) may be studied by the same 
methods as were used in Section 6 for that of equations (6), but this 
topic is not pursued further here. 


B < (6+ 2) X < gain of A 
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Some Computer Experiments in Picture 
Processing for Bandwidth Reduction 


By H. J. LANDAU and D. SLEPIAN 
(Manuscript received January 4, 1971) 


Some computer experiments in processing still pictures for band- 
width reduction are described. In the scheme studied, a picture is 
partitioned into subpictures each of which its encoded separately. A 
subpicture is expressed as a linear combination of a finite set of 
specially chosen basis subpictures. Quantized versions of the coeffi- 
cients of this expansion are transmitted as binary digits. Using this 
procedure, we were able to obtain pictures of good quality using ap- 
proximately 2 bits per picture-element; we were unable to do so at 
lower bit-rates. 

Some general comments on the encoding of pictures are included. 


I. INTRODUCTION 


This paper describes some computer experiments in picture process- 
ing carried out by us during the winter of 1969 and the spring of 1970. 
Our goal was to explore a particular method for the efficient encoding 
of typical Picturephone® scenes into binary digits. The experiments 
involved still pictures only. We first describe the experiments and their 
results, then follow with some general comments on the encoding of 
pictures. These comments are intended to explain our motivation for 
the particular investigation undertaken. 


Il. THE EXPERIMENTS 


By means of the TAPEX unit at Murray Hill,):?? a photograph can 
be represented in digital form suitable for handling by the GE-635 
computer. Specifically, the picture is scanned from top to bottom 
along 71 horizontal lines, the light intensity being sampled mn. times 
along each line; every sample is then quantized to the nearest one 
of 2* equally spaced amplitude levels, and recorded on a digital tape 
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as a single k-bit integer. Conversely, given the digital tape, each k-bit 
integer is replaced by its corresponding amplitude value, which is 
then regarded as a sample, taken at the Nyquist rate, of a bandlimited 
waveform. The reconstructed waveform controls the beam intensity of 
successive lines traced by a scanning cathode-ray oscilloscope. The 
oscilloscope face is photographed. 

In all our experiments, the values used for the above quantities were 
Ny = N_ = 256, and k = 10. Figure 1a is an original photograph; Fig. 
1b is the result of converting the picture into binary digits on tape 
and reconstructing via TAPEX. Comparison shows that, with the 
parameters as chosen, the digital representation is of customary tele- 
vision quality. It has, of course, the inevitable raster lines.* 

In processing a picture, we first converted each k-tuple of binary 
digits into the corresponding integer, and subtracted 2*—?. The result- 
ing integers, lying in the range (—2*—1, 2'-1 —1), are called picture 
elements, and we denote by Xj; the picture clement obtained from the 
jth sample of the ith line of the picture. For computer processing, we 
regard a picture as an m X me matrix of picture-elements. In our 
experiments, we further partitioned the n; X ms-element picture by a 
square grid (as in Fig. 2) into ny X ne/m? square subpictures, each 
having m picture-elements on a side. These subpictures were encoded 
independently, one at a time, by the scheme described below. 


We view the M = m’ picture-elements of a subpicture, when read 
out row by row from left to right, as the components of an A/-dimen- 
sional vector Y, which represents the subpicture. For example, 


X14 
Xi2 


>. Gem 


X iin 
is the vector representing the top-left subpicture of Fig. 2. To describe 
such vectors, it is natural to introduce a basis. We therefore choose 
orthonormal //-dimensional basis vectors b; , --+ , Dar. These remain 
fixed, and determine the particular encoding scheme under discussion. 
* (Added in proof.) The half-tone method of picture reproduction used in 


printing this article of necessity obscures some of the detail visible in the original 
TAPEX photographs. Copies of these photographs will be sent on request. 
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Fig. 1—(a) Original photograph; (b) Reconstruction of (a) via TAPEX, using 
10 bits per picture-element. 


We may now expand Y in terms of the basis vectors, to obtain 


M 
Y —> > c,b; (1) 

1 

where, by orthonormality of the b; , 
c; = b, Y, j=1,---,M. (2) 


We then quantize c; into one of 7; different values, denoting by é; the 
quantized version of c;. To transmit these quantized coefficients of a 
subpicture in the simplest possible way, i.e., by encoding them in- 
dependently without exploiting the statistical distribution of their 
values, requires r = >.” [log. r;] binary digits. We take the number 





Fig. 2—Partition of pictures into subpictures. 
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R = r/m’, which is the number of bits used per picture-element, as a 
measure of the efficiency (or bandwidth) of the encoding scheme. 

To reconstruct a picture, we suppose the quantized coefficients are 
known, and obtain a reconstructed subpicture code vector ¥ from the 
recipe 


Y = > éh,. 


The components of Y¥ are quantized to the nearest integer value in the 
range [—2*"', 2*"' — 1]. These are the picture-elements X,,; of the 
reconstituted picture. A photograph is obtained from these values using 
the TAPEX unit in the manner already described. 





Fig. 3—(a) 10 bits per picture-element; (b) Reconstruction of (a) by means 
of the Hadamard basis, using 2 bits per picture-element; (c) Reconstruction of 
(a) by means of differential PCM, using 3 bits per picture-element. 
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Our experiment consisted of choosing various basis vectors and 
various quantization rules for the expansion coefficients c;. We also 
experimented with making nonlinear transformations on the picture- 
elements before and after bandwidth compression processing. None of 
the various types of companding we tried yielded better results than 
were obtained without companding. In most of our work, subpictures 
of M = 16 picture-elements (m = 4) were used; a few experiments 
were run with m = 8. 

We were able to obtain pictures of good quality with a rate R = 2 
bits per picture-element, but were unable to do so at lower rates. 
Figure 3a repeats the 10 bit per picture-element photograph of Fig. 
ib. Figure 3b shows a reconstructed picture with R = 2 bits per 
picture-element obtained with a scheme using m = 4 and the Hada- 
mard basis described below. We also simulated on the computer the 
differential PCM scheme employed in Picturephone coding which uses 
3 bits per picture-element.? Figure 3c shows the result of this simula- 
tion; it compares favorably with Fig. 3b. Two different subjects are 
treated analogously in Figs. 4 and 5. 

Although the subpicture encoding achieves a one-third decrease in 
rate, the differential PCM scheme is far easier to instrument. From 
our experience, it seems unlikely that good pictures can be obtained 
with the subpicture scheme at rates much less than 2 bits per picture- 
element. 


Ill. COMMENTS ON COMPRESSION 


To avoid needless complications, in all that follows we shall think of 
a picture in discrete terms, 1.e., as a finite collection of picture- 
elements, each of which can assume finitely many different values. 

How many bits must one use to transmit the picture of Fig. 1b? 
The answer is, of course, zero. It is a single picture. The question 
is not an interesting one. More pertinently, we can ask how many bits 
per picture are required on the average to transmit long strings of 
pictures drawn from a given ensemble of pictures. Since a picture 
source can be regarded as producing sequences of picture elements, 
each of which can assume one of K different values, evidently we can 
transmit all possible pictures perfectly by using [log K] bits per 
picture-element. For any reduction of the bit-rate below this value, 
we must capitalize on one or both of these facts: 


(z) not all pictures are produced with equal probability by the 
source, nor are they produced independently; 

(zt) the observer does not require all pictures to be reproduced 
exactly. 
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Fig. 4—(a) 10 bits per picture-element; (b) Reconstruction of (a) by means 
of the Hadamard basis, using 2 bits per picture-element; (c) Reconstruction of 
(a) by means of differential PCM, using 8 bits per picture-eclement. 


The question of how to take advantage of such considerations has 
been much studied in information theory, and methods are known in 
principle for computing the answer. A calculation of the entropy 
of the picture ensemble describes how far it is possible to reduce the 
bit-rate, and still maintain perfect reconstruction, by exploiting source 
redundancies; this minimum rate is determined solely by the statistics 
of the ensemble, and has nothing to do with the nature of pictures, 
vision, or the observer. 

Determination of the entropy of a picture source does not solve the 
problem of real interest to workers in picture transmission; for pic- 
tures, as they are usually presented by a source, have more detail and 
resolution than the observer can utilize. Thus pictures that are counted 
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as different in the source ensemble may be indistinguishable to the 
viewer who wishes to reconstruct them only to some set limit of 
accuracy, 1.e., to achieve some “level of fidelity”. The minimum bit- 
rate, given as a function of both the particular fidelity criterion 
adopted and the source statistics, may also be computed, and is called 
the rate distortion function. As with entropy, this minimum rate is 
achievable only in the limit of more and more complicated encoding 
processes. 

Conceptually, rate distortion theory formulates carefully and an- 
swers completely the foremost question in the TV coder’s mind: “How 
many bits do I need?” In actuality, it doesn’t do very much for him 





Fig. 5—(a) 10 bits per picture-element; (b) Reconstruction of (a) by means 
of the Hadamard basis, using 2 bits per picture-element; (c) Reconstruction of 
(a) by means of differential PCM, using 3 bits per picture-element. 
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at all. To see why this is so, we must look just a bit closer at the for- 
malism of rate distortion theory. 

The general theory presupposes a source that produces infinite 
strings of symbols, each symbol being drawn from a K-letter alphabet. 
The “values” of the letters play no role in the theory, so for conven- 
lence we suppose that they are the integers 1, 2, --- , K. Denote a 
typical string produced by the source by :+ X_1, X», X1, Xe +': , 
where each X is one of the integers from 1 to K. A measure is placed 
upon the set of such infinite strings in such a way that we can regard 
the Xs as random variables and meaningfully ask and answer such 
questions as “What is the probability that XY) = 8, X,. = 1, and 
X3 = 5?” There are many technicalities involved in specifying this 
measure, but they need not concern us here. We are also given a 
numerical-valued distortion function 8(j, k) 2 O which gives the dis- 
tortion when a transmitted letter 7 is reconstructed as letter k. 


Let us now consider transmitting the strings produced by the source 
by encoding them in the following manner. We break the source strings 
up into blocks of n successive symbols. Since each block is composed of n 
source symbols and each symbol can be one of K different integers, 
there are B = K" different blocks possible. We suppose that a dictionary 
is provided, which lists for each one of these B blocks a special block 
called its representative block. As successive strings of n source letters 
are produced by the source, each is looked up in the dictionary and 
encoded into ts representative block. If the letters of a block are z, , 
Za, +++ , ¢, and the letters of the corresponding representative block 
taken from the dictionary are y; , Y2, °° » Yn, we take the quantity 
D = 3°} 6(a; , y:) as the distortion per block. We take d = average D/n 
as the level of distortion achieved with the given code book, where 
the average is over all source strings. 

Let us now fix the number of representative blocks in the dictionary 
at 2”. Some code books translating the B blocks into 2” representative 
blocks will yield smaller values for the distortion d than will others. 
We denote by d(L, n) the smallest distortion obtainable by any such 
code book. Note now that since there are only 2” representative blocks 
in these code books, we could use L binary digits to transmit each rep- 
resentative block name to a destination. We would achieve distortion 
d(L,n) and be transmitting at a rate 


R = (L/n) bits/(source symbol). 


Now fix R and write d(R) = lim,.. d(nR, n). This function gives 
the smallest distortion obtainable for a fixed binary rate R that can be 
had in the limit of arbitrarily large code books. The inverse function 
R(d) which gives the smallest binary rate per source letter that will 
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yield a given distortion d is called the rate distortion function. Information 
theory shows how R(d) can be calculated in principle from the symbol 
distortion function 6(j, k) and the measure assigned to the source. We 
do not display these complex formulas here. 

How can we apply this to picture transmission? There are two obvious 
different methods of identifying the source symbol X with a quantity of 
interest in picture coding. The method most satisfying conceptually is to 
identify the random variable X with an entire picture. This is possible 
since there are only finitely many different pictures, due to our assump- 
tion that a picture is composed of n, X nz picture elements, each taking 
one of 2° values. We number the possible pictures and take the picture 
numbers as the values of X. The distortion function 6(j, k), which we 
must now describe to apply the theory, measures our dissatisfaction at 
having picture j reproduced as picture k. Conceptually such a measure 
exists, but we know little about it. In our experiments, we would have to 
prescribe it for (2'0%7°°*?°°)? pairs of values of j and k. 

To compute a value for the rate distortion function, we require in 
addition a measure on the source symbols: at a minimum, this involves 
assigning a probability distribution, bearing some relation to what will 
be observed in practical transmission, to the 21°%?5°%? = 19797:789 
different possible pictures. This task seems quite beyond us now. For to 
obtain a histogram empirically is out of the question: at 30 frames per 
second, one sees only 10° frames per year, and if the different possible 
pictures were run off in sequence at this rate, it would take 107?” 
years to view them all. On the other hand, to specify the distribution theo- 
retically requires more understanding of the situation than we now have. 

Indeed, being able to describe a reasonable distribution for the pos- 
sible pictures goes a long way towards solving the problem of efficient 
coding. We suspect that a reasonably good description would assign 
probability 1/N to each of N of the pictures, and zero to the rest, with 
N small indeed compared to 101°7:?8*, If we could describe this set well, 
we could encode using log N bits/picture. But which are the “likely” 
pictures? For Picturephone service or commercial broadcast television, 
intuition suggests that chaotic pictures, in which adjacent picture- 
elements jump about between extreme values, would be classified as 
unlikely. Likely pictures are, roughly speaking, made up of regions of 
nearly constant brightness. The brightness might change considerably 
from one region to the next, but there cannot be too many small re- 
gions, or we are back to unlikely chaos, nor can the boundaries of the 
region be too wild or fast-turning. However, the enormous number of 
possibilities involved prevents an accurate description. 

A more tractable application of the general theory comes from asso- 
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ciating the source symbol X with a picture-element. Now, however, the 
distortion function 8(j, k) measures our displeasure at having the jth 
level of brightness for a picture-element reproduced as the kth level of 
brightness. This is an excessively local measure of picture fidelity and 
is probably quite remote from the criteria used by human observers. 

In summary, rate distortion theory tells us that to encode efficiently 
we must pay attention to the more likely pictures (or sequences of 
pictures), and that we must replace these in groups by representative 
values which yield an acceptable distortion. The theory tells us how 
to calculate the minimum bit-rate needed to achieve a given level of 
fidelity, but to carry out the calculation we need to know the distor- 
tion function 8(j, k) and the measure that gives a statistical descrip- 
tion of the source. In picture coding we have at present very meager 
knowledge concerning these quantities. Any new understanding of 
either will undoubtedly lead to improved practical coding schemes. It 
will take a great deal of understanding, however, to know these quanti- 
ties well enough to allow a calculation of a rate distortion function in 
which one can have much confidence. 


IV. MOTIVATION FOR THE EXPERIMENTS 


As we have argued, we lack the information required to bring the 
full force of rate distortion theory to bear on picture coding. Neverthe- 
less, it was the general approach of rate distortion theory that led to 
the encoding scheme of our experiments. We wanted an encoding pro- 
cedure that also would derive from considerations of likelihood and 
fidelity, but that would be manageable in practice. Accordingly, we 
began by focusing on subsections of the picture. 

Although we cannot characterize adequately those entire pictures 
that are likely, perhaps we can do so for subpictures. How large must 
a section of a TV picture be before we can describe it as a likely sub- 
picture or an unlikely subpicture? If we look at a single picture-ele- 
ment, every value is “likely.” If we look at two adjacent picture-ele- 
ments, again we must say that any pair of values is “likely.” If we 
consider square subpictures of m X m picture-elements, most observers 
feel that for m = 4 they can already classify some subpictures as likely 
parts of the Picturephone or TV ensemble and others as less likely. 
The 4 X 4 checkerboard pattern, where adjacent picture elements 
oscillate between extreme values, seems unlikely: the uniform 4 x 4 
subpicture seems highly likely. 

We decided then to break a picture into m X m subpictures and to 
encode each subpicture independently. If m is large enough, not much 
compression potential will be lost by neglecting the correlation between 


COMPUTER PICTURE PROCESSING 1535 


subpictures, for the chaos of structure that we intuitively feel to be 
unlikely in the TV ensemble is of a within-subpicture scale. The factor 
driving us to choose a small value of m is the need for a reasonable 
number of pictures to distinguish among on a probabilistic basis. 

Even with m = 4 and k = 10, there are 2?°* 16 = 10** different sub- 
pictures, so that it is out of the question to use a Huffman-Fano code, 
or other dictionary-like code, to take advantage of the unequal proba- 
bilities of the various subpictures. We seek some other scheme. A 
natural idea here is to represent the subpicture in terms of some coordi- 
nates that can be treated independently and that are related to the 
probability measure on the subpictures. 

We begin by interpreting the subpicture as a vector, in the manner 
described in Section II. Suppose that an d/-dimensional subpicture 
vector Y is to be expanded on J linearly independent basis vectors b; as 
in equation (1), but that only J < M of the cs will be used (exactly) to 
reconstruct an approximation Y to Y. Thus 

M 


J 
Y= deb, Y= Vewhu,. 
1 


1 


where a, , @,..,. , a, are distinct integers from the list 1, 2, .,, , MW. 
What basis vectors should be used, and which J coefficients retained, in 
order to minimize the mean squared error between Y and Y? The 
answer to this problem is well known. Let Y = (y;, Yo, °--- , Yar) and 
denote the covariances of the components of Y by p,;; = Ey.y; . Let p be 
the AJ & M matrix with elements p,;;. The basis vectors that solve 
the above problem are the eigenvectors of p having the J largest eigen- 
values. Basis vectors chosen in this way are known as a Karhunen- 
Loéve basis. The fidelity criterion implicit here is that of mean-square 
error—one not very adequate when applied to pictures. 

~We began our experiments with finding the Karhunen-Loéve basis 
for the picture of Fig. 1, by determining empirically the 16 x 16 
covariance matrix for the 4 x 4 subpictures. We discovered, as ex- 
pected, that the p;; were all extremely close to 1, expressing the high 
likelihood of uniform brightness over so small a square. When each 
pi; = 1, the eigenvalue problem is degenerate: the first eigenvector has 
all components equal and an eigenvalue of 1, while the remaining 
eigenvectors are indeterminate and correspond to an eigenvalue of 
0. Thus we felt no great confidence in our determination of the Kar- 
hunen-Loéve vectors, believing that it was probably unstable, and 
turned instead to the following intuitive justification for introducing 
another basis, which we call the Hadamard basis. 


Intuitively, a subpicture with constant brightness for all picture- 
elements is a very likely subpicture. Part (1) of Figure 6 depicts a 
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4 X 4 array of picture elements all having value + 1/4. Another likely 
subpicture has a vertical edge running down its middle. Part (2) of 
Figure 6 depicts such a case, where the picture elements on the left 
have value + 1/4 and those on the right have value — 1/4. If now we 
form a basis vector, b; , from Fig. 6(1) and b, from Fig. 6(2), we see 
that linear combinations Y = c,b, + csb2 give all possible subpictures 
having a center vertical transition between two regions of uniform 
brightness. 

Continuing this train of thought, we are led to seek 16 linearly 
independent subpictures of decreasing likelihood that will serve as a 
basis on which to expand an arbitrary subpicture. Such a basis, chosen 
so that the vectors are orthonormal, is shown in Fig. 6. If each sub- 
picture Y of a large picture is expanded on this basis, so that 


16 
Y= Pa c;b; , 

1 
we would expect frequently to find the higher coefficients, say cio, C1; , 
etc., to have values near zero. The coefficient c, , which gives the average 
brightness of the pictures, would be expected to have a large variance— 
higher coefficients, a much smaller variance. Table I lists the ratios 
£; = o,/o;, where a; is the variance of c; , as determined empirically 
from all the subpictures of Fig. 1b. The results agree remarkably well 
with intuition. That c, , which has more than ten times the variance of 
any other coefficient, does indeed contain a great deal of the essence 
of the original picture is seen from Fig. 7b, which shows the picture 
resulting from the reconstruction Y = cb,, next to the original photo- 





Fig. 6—The Hadamard basis. 
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TABLE I—CoEFFICIENT VARIANCES 








a g3 a gi 

1 1.00 9 0.024 
2 0.098 10 0.024 
3 0.087 11 0.020 
4 0.035 12 0.022 
5 0.038 13 0.019 
6 0.051 14 0.015 
7 0.048 15 0.016 
8 0.034 16 0.014 











graph 7a. Figure 7b allows one to see clearly the size of the subpictures 
used in the experiments. 

Our choice of the Hadamard basis is thus dictated by plausible 
guesses about the probabilities of subpictures. Furthermore, it is not 
inconsistent with the Kahunen-Loéve procedure, since the correlation 
matrix is so nearly singular. Finally, it has an important practical ad- 
vantage: since its components each have value + 1/4, the computation 
of the coefficients can be carried out by simple switching. In our ex- 
periments, we found the results of processing with the Karhunen-Loéve 
basis to be no better than those obtained with the Hadamard basis, 
and so, for reasons of simplicity, judged the latter to be superior. 

Since our object is to reduce the bit-rate, we must adopt some scheme 
of quantization for the coefficients. This will lead to an approximate 
reconstruction of the subpicture, and considerations of fidelity must 
guide us in our choice of rules. One such quantization scheme—keep- 
ing some of the coefficients exactly and dropping the remainder alto- 
gether—gives rise to the Karhunen-Loéve problem. We adopted a 
different procedure, based on quantizing successive coefficients more 





Fig. 7—(a) Original photograph; (b) Reconstruction of (a) by means of c: only. 
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TaBLE JJ—QUANTIZATION OF COEFFICIENTS 
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and more coarsely. Two arguments led us to this. Firstly, since the 
lower coefficients have more variability, reproducing these more accu- 
rately helps reduce the mean-square error for the more probable pic- 
tures. Secondly, the higher coefficients tend to be large mainly when the 
subpicture has a very “busy” or chaotic nature; we guessed the detail 
of that chaos to be less important to the viewer than the existence of 
chaos. Thus the fidelity criterion behind our encoding contains an ele- 
ment of the characteristics of observers, in addition to considerations 
of mean-square error. 

Much experimenting bore out the general truth of these suppositions. 
Table II gives the number of quanta, 7; , used for ¢; in Figs. 3b, 4b, and 
5b. The quantization of a given c was carried out by dividing its range 
into disjoint intervals whose endpoints are called cut points and by 
associating with each interval a representative value. The quantized 
value of c is the representative value associated with the interval in 
which c lies. Table III lists the cut points and representative values 
used to obtain Figs. 3b, 4b, and 5b. 

We carried out over 100 experiments in which the 7; , the cut points, 
and representative values were varied over considerable ranges; details 
are available on request. Although we ultimately settled on the con- 
figuration described in Tables II and III, the number of possibilities 
to be explored is so large that we have no great confidence that we 
have found the best values for the parameters. On the other hand, 
based on our experience we would judge it unlikely that significant 
improvements can be made with this scheme by further changes of 
parameter values. 


V. PICTURE REPRODUCTION AND QUALITY JUDGMENT™ 

The development of ordinary photographic film, as well as the 
characteristics of analog devices such as scanners and picture tubes, 
* (Added in proof.) The comments of this section refer to the original TAPEX 


photographs, rather than to their reproductions in this article. See footnote on 
p. 1526. 


Tas LE I[I]—Cur Points AND REPRESENTATIVE VALUES USED FOR QUANTIZING THE COEFFICIENTS (, «°° , Cig. 








Possible values for the cx lie in the range + 2048.0, and are integer multiples of 0.25. 





CQ! The range between —1940 and +1900 was divided into 64 intervals of length 60, the midpoint of each serving as 
representative point for the interval. The first and last representative points also represented any values of c; outside 
this range. 











The cut points and representative values for c. — ¢io were generated from curves of the form y = k Wz, as follows: 














(,¢3: Cut points + 0 8.6 34.4 717.4 137.5 
Repr. values + 2.1 19.3 53.7 105 
Cut points + 215 309 421 
Repr. values  -+ 174 260 363 483. 
C4, C5, Coy C7? Cut points + 0 15.6 62.4 140.4 
Repr. values + 3.9 35.2 97.6 191. 
C8, C9) Cro! Cut points + 0 62.5 
Repr. values + 15.6 141 








Cu, *°+, C16! dropped 
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are sensitive to many parameters and can vary noticeably over time. 
The result is that in processing and reproducing photographs it is 
extremely difficult to maintain rigid control of contrast and average 
gray level. Yet these two quantities strongly influence the viewer’s 
judgment of the quality of a picture. 

In our subpicture encoding scheme, information about overall con- 
trast and gray level is contained almost entirely in the values of ¢c. 
We are convinced by experiments performed that the quantization of 
this coefficient, as described by Tables II and III, is sufficiently 
fine to render these characteristics faithfully. We therefore believe that 
what variation in contrast exists on Figs. 1, 3, 4, 5, and 7 is attributa- 
ble to vagaries of the reproduction process and not to failures of the 
encoding, which are evidenced by inaccuracies in edges and texture. 
Accordingly, the reader should attempt to subtract out the differ- 
ences in contrast among the photographs and should judge the quality 
of our scheme by examination of detail in Figs. 3b, 4b, and 5b. 


VI. RELATED WORK 


At about the same time that the present experiments were carried 
out, rather similar investigations were independently conducted else- 
where by other workers.®? While related to our work, these studies 
differ from it somewhat in detail of execution and very much in the- 
oretical approach. 
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Computer Synthesis of Speech by 


Coneatenation of Formant-Coded Words 
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Speech signals can be described in terms of the resonances of the 
vocal tract. These resonances, or formants, change at rates comparable 
to the motions of the vocal tract. They therefore can be sampled and 
quantized to low bit-rates, and hence constitute an economical form 
for digital storage of speech information. Formant coding also per- 
mits flexible arrangement of speech elements into various contexts. 
This report describes a computer technique for synthesizing continu- 
ous messages by concatenating formant data for word-length utter- 
ances. The stored data for the synthesis corresponds to a bit-rate of 
533 b/s. A Honeywell DDP-516 computer is used to experimentally 
evaluate a voice response system. In an initial application, the system 
is used to synthesize 7-digit telephone numbers. To assess the syn- 
thesis an interactive dialing experiment, also conducted by the com- 
puter, 1s described. The results show the synthesized numbers to be 
comparable in communicative effectiveness to naturally spoken digits. 


I. INTRODUCTION 


If computers could speak with sophisticated vocabularies they could 
provide a variety of automatic information services. Machines could 
be interrogated from conventional Touch-Tone® telephones and stored 
data could be accessed by voice. 

Naturally spoken speech messages can of course be prerecorded and 
stored. However, the digital storage required for sizeable amounts of 
natural speech is inordinate. Further, elements of natural speech in 
one context cannot be realistically assembled into a different message. 
With individual pieces of the signal waveform there is no practical 
way of making natural transitions from one element to the next. In 
certain messages of highly limited context—notably the Automatic 
Intercept System—individual words are adequately abutted by having 
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more than one spoken version of each word. In general, however, 
sentence-length material cannot be satisfactorily produced in this 
manner. 

For answer-back purposes, requiring sizeable vocabularies, an effi- 
cient means of storing and accessing speech information is required. 
This requirement implies low bit-rate representation of vocabulary 
elements and a flexible means for assembling the vocabulary elements 
into any message specified by the answer-back program. Toward this 
requirement, we have devised a synthesis method based upon formant- 
coded vocabulary elements. 

Formants are the resonances, or eigenfrequencies, of the vocal tract. 
They change at relatively slow rates, comparable in speed to the ar- 
ticulatory motions. Their variations with time can consequently be 
sampled and quantized to low bit-rates. Furthermore, this description 
of the speech signal permits separation of information about vocal- 
tract excitation (i.e., voiced/unvoiced distinctions and voice pitch) 
from the resonance information. The formant description therefore 
provides a flexible means for smoothly assembling vocabulary ele- 
ments into connected speech. 

Toward this goal of low bit-rate storage and flexible assembly of 
computer speech, we have implemented and experimentally evaluated 
a formant-synthesis answer-back system. In the subsequent discussion 
we outline principles of the implementation and offer results of an ini- 
tial application to the synthesis of telephone numbers. 


II. SYNTHESIS MODEL 


The model for formant synthesis of speech is shown in Fig. 1. The 
voiced sounds of speech (i.e., those generated by vocal-cord vibra- 
tion) are produced by the upper branch of the system. An impulse 
generator produces a sequence of impulses whose spacing is controlled 
by the “pitch period” parameter, P, which corresponds to the period 
of vocal cord vibration. This impulse train is modulated in amplitude 
by a control parameter, Ay, which represents the intensity of voiced 
sounds. The resulting signal excites a time-varying digital filter com- 
posed of four cascaded resonators. Three of the resonators have time- 
varying resonant frequencies (Ff; , F, , F3)—-which correspond to the 
first three resonances, or formants, of the vocal tract. The output of 
this system is passed to a fixed, second-order digital filter which ap- 
proximates the source spectrum and mouth radiation characteristics 
of human speech.* 
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Fig. 1—Digital formant synthesizer, block diagram. 


Unvoiced speech is produced by the lower branch of the system in 
Fig. 1. A random number generator, representative of the fricative 
noise source in unvoiced speech, produces samples of uniformly dis- 
tributed white noise. The noise amplitude is modulated by the control 
parameter, Ay, which represents the intensity of unvoiced sounds. 
This signal excites another time-varying digital filter composed of one 
time-varying resonator (/,) and one time-varying antiresonator (F,). 
This pole-zero pair constitute an approximation to the formant struc- 
ture of unvoiced speech sounds.’ The output of this system also is 
passed to the fixed spectral-shaping filter. Digital-to-analog conver- 
sion provides an audible output. 

All the parameters required by the synthesis system of Fig. 1 can 
be estimated automatically from natural speech by recently developed 
digital signal processing techniques.”* 


III. CONCATENATION MODEL 


The Acoustics Research DDP-516 computer facility has been used 
to implement a complete answer-back system. A block diagram of the 
system used for synthesis of connected speech from a vocabulary of 
formant-coded words is shown in Fig. 2. Naturally spoken, isolated 
words (or phrases) are analyzed by a formant analyzer to give three 
formants (F,, F., Fs) voiced and unvoiced amplitude (Ay, Ay), 
pitch period (P), and unvoiced pole and zero (F,, F,) once every 10 
ms. These control parameters are smoothed by programmed digital 
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Fig. 2—Overall synthesis system, block diagram. 


filters, sampled at their Nyquist rates (typically 33-1/3 s) , quantized, 
and stored in the word catalog as the reference library. The typical 
bit-rate used for storage of these data is 700 b/s when the pitch signal 
is saved. When pitch is not saved (the usual situation since it is nor- 
mally calculated by the concatenation program) the bit-rate for the 
stored data is 5383 b/s. Table I shows a breakdown of how these bit- 
rates are achieved. The data in this table were derived from experi- 
mental investigation of the effects of smoothing and quantization on 
the perception of the synthetic output. 

As shown in Table I, at every 10-ms interval the speech is classi- 
fied as voiced or unvoiced (V/U) by a 1-bit signal. Thus, for each 


COMPUTER SYNTHESIS OF SPEECH 1545 


frame, storage is required for either voiced parameters or for unvoiced 
parameters, but not for both. It should be noted that the control pa- 
rameter frame rate (33-1/3 s*) is one-third the rate of the V/U sig- 
nal. The manner in which the raw data (which are obtained at a 100 
st rate) are coded to the lower rate, consistent with the frame rate of 
the V/U signal, is described in Ref. 3. 

Once input words and phrases are coded in terms of the formant 
representation, they can easily be modified for use with the synthesis 
program. Words can be lengthened or shortened, formants can be 
changed easily, and a pitch contour, different from the one originally 
spoken, can be superimposed on the data. Thus the vocal resonance 
data is available to the synthesis program in a form flexible enough 
to conform to the timing and pitch generated by the concatenation 
program. 

The lower portion of Fig. 2 shows how the system assembles a syn- 
thetic message composed of words and phrases from the reference 
library. First, the answer-back program requests the word sequence 
for a specific message. The word concatenation program first deter- 
mines timing data for the message (in one of several ways to be ex- 
plained below) from an auxiliary program. The timing data is in the 
form of a word duration for each word in the output message. The 
concatenation program then accesses, in sequence, the control param- 
eters for each of the words in the string. A duration modification ad- 
justment on each word is first made, so the word duration in context 
matches the duration specified by the timing rules. Next the concat- 
enation program smoothly interpolates the formant control parameters 
when the final part of any word and the initial part of the fol- 
lowing word are both voiced. An interpolation algorithm designed to 


TABLE I—CopInG or FoRMANT PARAMETERS 




















Parameter No. bits/frame No. frames/second | No. bits/second 
Fy or F, 3 33-1/3 100 
F. or F, 4 33-1/3 133-1/3 
PF; 3 33-1/3 100 
P 5 33-1/3 166-2/3 
Ay or Ay 3 51/3 100 
V/U 1 100 100 
Total 700 





Pitch —166-2/3 





Data rate for synthesis using calculated pitch data 533-1/3 
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produce physiologically realistic formant transitions is used. Finally 
a continuous function for pitch variation is produced for the whole 
message. All computed control parameters are outputted to a hardware 
digital speech synthesizer designed in accordance with Fig. 1. Digital- 
to-analog conversion produces a continuous synthetic speech output. 

In the remainder of this section we will detail the way each of the 
above operations is carried out. In Section IV we will give an illustra- 
tive example of the use of the system for the synthesis of 7-digit tele- 
phone numbers. Further, we will describe a dialing experiment, using 
the synthesized numbers and the DDP-516 in an interactive manner, 
to estimate the communicative effectiveness of the synthetic speech. 

The duration computations and the interpolation algorithm of the 
concatenation program depend upon a measure we call “spectral de- 
rivative.” 


3.1 Spectral Derwative 

The control parameters are stored in the catalog at a sampling rate 
of 33-1/3 per second. When accessed and used in the synthesis pro- 
gram, however, they are interpolated to a rate of 100 per second; i.e., 
10 ms between frames. For each 10-ms frame of a given word, a cal- 
culation is made of the absolute rate of change of the formant data 
from the previous frame. We call this calculation the spectral deriva- 
tive, since it is a measure of how rapidly the spectrum is changing. 
The spectral derivative is used to determine where to lengthen or 
shorten a word, and is also used to determine at what rate a formant 
transition is made from one voiced interval to the next. 

For each voiced 10-ms interval, the spectral derivative, SD; , is com- 
puted as: 


SD; = do | Fa) — Fi — 1) | (1) 


where 2 is the ith 10-ms interval in the word, and F;(z) is the value 
of the 7th formant in the ith time interval. This measure of spectral 
change is an arbitrarily chosen one; several others could be consid- 
ered. For instance a weighted sum of absolute values of formant 
change: 


SD. = Da, | FO — FG 1)| (2 


might be a suitable replacement for equation (1) above. By adjusting 
the weights, a;, the influence of changes in individual formants 


COMPUTER SYNTHESIS OF SPEECH 1547 


can be made large orsmall. For example, by making as, much 
larger than a,, or a3, the spectral derivative is essentially the abso- 
lute change in the second formant. A more reasonable choice for the 
weight, a;, might be the average value of the jth formant. The spec- 
tral derivative would then be the sum of relative changes in the for- 
mants. Although there are several possibilities for spectral derivative, 
the measure of equation (1) is the one we use throughout. 


3.2 Timing Calculation 

The timing calculation essentially consists of determining the dura- 
tion of cach of the words and phrases in the context of the message 
to be produced. There are several possible methods we have consid- 
ered for determining these durations—ranging from fully automatic 
rules, which use syntactic and grammatical information, to manual 
insertion into the program of the desired timing sequence. 

One technique, and the most accurate way of obtaining timing data, 
is to make measurements from a naturally spoken version of the mes- 
sage and manually supply these data to the program. This possibility 
is indicated at the bottom of Fig. 2 as the external timing data input 
to the timing subroutine. The timing data obtained in this manner 
are optimum and can be used to evaluate the efficacy of other aspects 
of the synthesis rules. This form of input is therefore important for 
evaluational purposes. 

A second technique for obtaining word duration data is to make the 
duration of each word be some fixed percentage of its duration in 
isolation, independent of the message context. The motivation here is 
that the duration of the word in isolation is an overbound of its dura- 
tion in context because of the unusually long vowels when spoken in 
isolation. Hence some shortened version of the word would suffice in 
many contextual situations. Clearly, the more limited the context of 
the message, the more applicable is the above approximation. 

Another technique for obtaining durational data is by simple table- 
lookup procedures. Here the duration of every possible input, in every 
possible contextual position must be tabulated. For limited context 
messages, such as telephone number generation, this table-lookup pro- 
cedure is an attractive way of generating timing information because 
of the limited number of situations which arise in practice. For more 
general situations, the amount of storage necessary would often be- 
come prohibitive. 

The most sophisticated way of generating timing data is to make 
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calculations based on language rules. A syntactic and phonetic anal- 
ysis of the printed text of the message is converted by rules into dura- 
tional data about each of the phonemes in the message. For the most 
general cases of speech synthesis, i.e. unrestricted context, this kind of 
procedure is an absolute necessity to give good timing data. A com- 
puter program for such sophisticated analysis has recently been de- 
veloped.®* Continuing work is aimed at combining this program with 
the concatenation system. 


3.3 Word Duration Modtfication 

Once the duration of the jth word in the message has been deter- 
mined by one of the methods discussed in the previous section, it is 
then necessary to modify the set of control signals of the reference 
version of the word to match the desired duration. Assume the dura- 
tion of the reference version of the jth word is w; frames and the de- 
sired duration is d; frames where a frame is 10 ms long. If we define 
the symbols: 


fi if the end of the (j — 1)st word is voiced, and the beginning 
Ip(j) = | of the jth word is voiced. 
0 


otherwise 


1 if the end of the jth word is voiced, and the beginning of 
Ij) = the (j + 1)st word is voiced. 


0 otherwise. 


Then 
b; = 0; — d, + $e) + Tol), (3) 


where 


i. = duration (in frames) over which voiced intervals are con- 
catenated 


b; = number of frames to be eliminated from (if 6; > 0) or added to 
(if b; < 0) the jth reference word. 


The reason for the last term in equation (3) is that whenever ad- 
jacent voiced intervals occur between words they are smoothly merged 
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together. Hence their durations overlap and it is this last term which 
accounts for the overlap. Typical values of t. are 4 to 10; 1.e., 40 to 
100 ms overlap between voiced words. 

The manner in which the b; frames are eliminated, or added in, is 
based solely on the spectral derivative. To eliminate frames, the 6; 
frames in the word having the smallest spectral derivatives are re- 
moved. To add frames, the region of the word having the smallest 
spectral derivative is located, and b; consecutive frames are inserted 
in the middle of this region. The parameter values during the inserted 
frames are identical to those of the frame nearest the middle of the 
region. The rationale behind this technique is that to lengthen or 
shorten a word, by any significant amount, it is most desirable to do 
this in parts of the word where the spectrum is changing the least. 
Thus the dynamics of the word are always unaltered by this method. 
A linear compression, or expansion, of the whole word is a useful tech- 
nique only when the compression or expansion ratio is close to 1.0. 
This is not always the case in synthesis, and so the above technique 
is used instead. 


3.4 Merging of Isolated Words 

Generally, the manner in which the control signals from isolated 
words are combined is by abutting them directly, once the timing 
modifications described above have been made. However when the 
words to be combined have a common voiced interval (i.e. the end of 
the one word and the beginning of the next word are both voiced), a 
more complicated procedure is used to merge the words. This is be- 
cause merely abutting the words would often produce cases where for- 
mants on one side of the word boundary would be vastly different 
from formants on the other side of the boundary. If such data were 
merely abutted, then in synthesizing the message objectionable tran- 
sients would be present at the boundary. To alleviate this problem, a 
merging interpolation algorithm is used. The algorithm is based on the 
spectral derivative, and provides smooth formant transitions from one 
word to the next. 

The merging procedure combines data over the last ¢, frames of the 
first word and the first ¢, frames of the second word. The duration of 
t. frames is called the overlap region of the words. The average spec- 
tral derivative during this region, for both words, is calculated as: 


SD1 = — => SD1 (2) (4) 
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SD2 = = Y° SD2() (5) 


where SD1(z), and SD2(z) are the spectral derivatives for the two 
words during the t, overlap frames. Using the notation: 


F;(2) 


value of the jth formant at frame 7 during the overlap region 


F’(@) = value of the jth formant at frame 7 during the overlap region’ 
for word k 


then the interpolation function used is: 
Fi(i)-(t, — ¢ — I) SD1 + Fi(2)-¢- SD2 
(¢, -~7— 1)-SDI1 +7-SD2 





FQ) = 





} 


t= 1, 2,08 gt, (6) 


Figure 3 illustrates the type of interpolation performed for four simple 
cases. (Although all three formants are interpolated in the program, 
for simplicity just one formant is drawn in Fig. 3 for each word.) The 
interpolated curve always begins at the formant of the first word, and 
terminates at the formant of the second word. The rate at which the 
interpolated curve makes the transition from the formants of the first 
word, to those of the second word, is determined by the average spectral 
derivatives SD1, and SD2. For case 1, in Fig. 3, SD1 + 0 so SD2 > 
SD1; hence the interpolated curve makes a rapid transition to the 
formant of word 2. Case 2 is the reverse of case 1: here SD1 > SD2, 
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Fig. 3—Interpolation of control parameter contours for four typical cases. 
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so the transition does not occur until near the end of the overlap region. 
For case 3, both words have a small spectral derivative; hence the 
interpolation function degenerates into a linear transition. For case 4, 
both words have large spectral derivatives; hence the transition occurs 
about midway through the overlap region. 

The data of Fig. 3 show that the interpolated formant function 
tends to be a smooth, continuous curve when the above technique is 
used. Values for ¢, , the number of frames in the overlap region, have been 
from 4 to 10; i.e., 40 ms to 100 ms overlap. 


3.5 Pitch Calculation 


One of the most important aspects of speech synthesis is the deter- 
mination of a suitable pitch variation for the message being produced. 
We have considered several ways of obtaining pitch information. These 
have included: 


(i) Supplying a pitch contour extracted from a naturally spoken 
version of the message: These data, when used with similarly ex- 
tracted timing data, give the most natural sounding messages that 
can be obtained with the technique. This form of input is most use- 
ful for evaluation purposes, but is not practical for an automatic 
system. 

(uu) Using an archetypal pitch contour: For limited context ap- 
plications this technique supplies a contour with realistic intona- 
tion, and hence is quite acceptable. The use of monotone pitch 
throughout the message is a special case of an archetypal contour, 
but such a contour gives an unacceptable drone to the speech, and 
hence would only be used in special situations. 

(wt) Calculating a pitch contour by rule based on a stress anal- 
ysis of the text of the message: This is a difficult task to do, but is 
most appropriate for an unlimited context, fully automatic system. 
Present research®* on this topic makes it an attractive possibility 
for incorporating into a concatenation system. 

(iv) Using the pitch variations associated with the isolated ver- 
sions of the word, and concatenating them to give the overall pitch 
contour: This technique is unacceptable, unless several versions of 
each word are stored in the library, because the pitch contour of the 
isolated word tends to characterize the word only in isolation. The 
pitch usually rises sharply at the beginning of the word, and falls 
sharply at the end of the word. When concatenated, the words sound 
distinct, rather than merging into a continuous message. 
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3.6 Gain Parameters 

The voiced gain parameter, Ay, is preserved on a frame-by-frame 
basis along with the formants. When formants are merged, the gain 
parameter is also merged. The unvoiced gain parameter, Ay , is also 
preserved on a frame-by-frame basis along with the fricative pole and 
zero. Ay is not required to be merged. 


IV. AN ILLUSTRATIVE EXAMPLE 


Figure 4 illustrates how these synthesis rules are applied in a typical 
case. At the top of this figure are shown the resonance data for four 
words spoken in isolation. The first and fourth words are entirely 
voiced, and the second and third words contain both voiced and un- 
voiced sections. The duration of each of the words spoken in isolation 
is shown by the w,’s in Fig. 4a. In order to form a message composed 
of these four words the following steps occur: 


(<) The duration of each word in the specified context is determined. 

(iz) Duration adjustments are made (frames removed or inserted) 
to match the timing of step 7. 

(77) Since words 1 and 2 do not share a common voiced interval, 
the time adjusted control signals for word 1 are accessed. 

(iv) Since words 2 and 3 do share a common voiced interval, all 
but the last ¢, frames of the time adjusted control signals for 
word 2 are accessed and abutted to the controls from word 1. 

(v) The last ¢, frames from word 2 are interpolated with the first 
t, frames from word 3, and added on to the previous control 
signals. 

(vt) Since words 3 and 4 do not share a common voiced interval, 
the remaining control signals for word 3, and the time adjusted 
control signals for word 4 are added on to the previous control 
signals. 

(vit) A pitch contour for the entire message is calculated. 
(vit) The message is synthesized. 


The resulting control signals and pitch contour are shown in Fig. 4b. 


4.1 Synthesis of Telephone Numbers 

For evaluation of this technique we chose the limited context situa- 
tion of synthesis of the carrier phrase ‘“The number is” followed by a 
7-digit telephone number. Here, the timing was generated by a simple 
table-lookup procedure. The timing data we used are shown in Table 
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Fig. 4—Typical example of how control parameters are generated from the 
word library store. A message composed of four words is illustrated. All param- 
eters are functions of time. 


II. The table shows the digit duration (in milliseconds) as a function 
of the number of phonemes in that digit and its position in the string. 
The data in the table were obtained from measurements on real 7-digit 
numbers and, in effect, constitute first-order statistics on duration. 
The influence of context (position in the digit string) is easily seen in 
Table II. For example, any digit in the third position is from 50 to 
90 percent longer than the same digit in the sixth position. 

A single archetypal pitch contour was used in all cases. The arche- 
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TABLE JJ—Simep.e Timine RuuEs For 7-Dicir NUMBERS 
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typal form was taken to match as well as possible the general shape 
of the pitch contours measured in naturally spoken 7-digit numbers. 
This basic shape was used to calculate the pitch contour for each num- 
ber string requested by the answer-back program. Informal listening 
suggested that this pitch contour was adequate as an initial estimate, 
and was a substantial improvement over the pitch information asso- 
ciated with individual isolated words. 

The synthesis program ran on the Honeywell DDP-516 computer. 
The isolated digits were analyzed and stored in the computer memory 
at a data rate of 533-1/3 b/s. The concatenation program accepted an 
input sequence from the typewriter or card reader, computed the con- 
trol signals for the message, smoothed them by programmed digital 
filters, and outputted the data to a hardware digital terminal analog 
synthesizer’* in real time. Figure 5 shows a spectrographic comparison 
between a typical computer-generated 7-digit number, and a natural 
version of the same number. The timing and formant data of the syn- 
thesized example are seen to be reasonably good matches to those of 
the natural utterance. 


4.2 Dialing Experiment Using Synthetic Speech 

To evaluate the communicative effectiveness of the concatenated 
synthetic speech in a real dialing situation, we arranged for the DDP- 
516 computer to speak telephone numbers (both natural and synthetic) 
to a listener in a sound booth. The listener was provided a conven- 
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Fig. 5—Spectrogram comparison between synthetic and natural versions of 
a typical telephone number. 
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tional Touch-Tone telephone with which he could dial the numbers. 
A central office Touch-Tone decoder received the dial pulses, de- 
coded them, and presented them via a data channel to the computer. 
The computer maintained a running analysis of the results. The ex- 
perimental arrangement is shown in Fig. 6. ; 

In the experiment we compared four types of speech. These included: 


I. Naturally-spoken, 7-digit telephone numbers. 
II. Naturally-spoken, isolated digits, abutted together. 
III. Synthetic isolated digits, abutted together. 
IV. Concatenated digits produced by the concatenation program 
method. 


Listeners, seated in a sound booth, heard telephone numbers over 
the Touch-Tone telephone. After a prescribed delay, they were re- 
quired to dial the number just heard. The DDP-516 computer gen- 
erated the signal, read the number dialed, and tabulated the results. 

Figure 7 shows the total number of dialing errors for 12 subjects. 
The dialing errors are broken down into digit errors (i.e., number of 
digits incorrectly dialed) and telephone number errors (i.e., number 
of phone numbers with one or more digit errors). The lower pair of 
curves shows the number of digit errors and the number of phone- 
number errors for 1-second delay in dialing of speech. The upper pair 
of curves shows the corresponding results for 5-seconds delay in dialing. 

An analysis of variance of these data indicated that, at the 95 per- 
cent level of confidence, there existed no significant difference between 
dialing performances with the natural and the synthetic concatenated 
signals (i.c., between speech types I and IV). In other words, syn- 
thetic, concatenated speech is comparable to natural speech in dialing 
effectiveness. 
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Fig. 6—Experimental arrangement used to measure the communicative effec- 
tiveness of several types of speech. 


COMPUTER SYNTHESIS OF SPEECH 1557 


220 
5 SECOND DELAY 120 


200 
_ DIGIT 
77 ERRORS 


180 
100 


160 


140 


\ 
TELEPHONE 
NUMBER 
ERRORS 


120 


NUMBER OF ERRORS IN 4200 DIGITS 


I 
| 

\ 1 SECOND DELAY 
\ 


NUMBER OF ERRORS IN 600 TELEPHONE NUMBERS 


\ 
100 Ne 
a 


ERRORS 
80 





I II III IV 
SPEECH TYPE 


Fig. 7—Experimental results showing the total number of digit errors and 
telephone number errors. Four types of speech are tested: I, natural digits; 
II, natural, abutted digits; III, synthetic abutted digits; IV, concatenated digits. 
Response delays of 1 and 5 seconds are tested. 


The differences between speech types I or IV and types IJ or III, 
however, was significant at the 95 percent level. That is, digital strings 
produced by simple abutting (II and III) led to a greater number of 
dialing errors. The suggestion is that the concatenation program is 
effective in reducing the dialing errors over that which would result 
from mere abutting of the digits. 

Another factor of interest, of course, is the naturalness of the signal. 
Some preliminary informal experiments indicate that listeners rank 
the naturalness of these four signals in order of the “machine attrib- 
utes,” 1e., type I speech is ranked most natural, followed by types II, 
IlI, and IV. The synthetic concatenated signal has more machine- 
made features than any of the others—with pitch and duration both 
being calculated by machine. One might be willing to accept machine 
accent if the signal has attractive advantages in communicative ac- 
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curacy and economy of storage. Formant synthesis using the concat- 
enation technique appears to have both. 
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Listener Evaluation of Simulated 


Telephone Calling Signals 


By P. D. BRICKER 
(Manuscript received December 2, 1970) 


This research concerns the judged pleasantness of a variety of elec- 
tronic calling signals. Five experiments are reported in which type and 
frequency of carrier; type, frequency, and waveform of modulation; 
and spectral composition were varied. The results have aided in the 
selection of two signals for further trials. 


I. INTRODUCTION 


Technological considerations suggest that an electronic “ringer” may 
succeed electromechanical devices in future telephones. This possibility 
has generated interest within the telephone industry in the specifica- 
tion of desirable characteristics for electronic calling signals. Not the 
least important of these characteristics is that such signals be accept- 
able to the subscriber on purely aesthetic grounds. The literature on 
tone ringers includes some references? to the measurement of lis- 
teners’ opinions, but only recently has any systematic work on what 
constitutes a pleasant signal begun to appear.’ An earlier study by 
P. D. Bricker and J. L. Flanagan* marked the beginning of an attempt 
to chart the preference-relevant dimensions of a fairly large class of 
calling signals. The present paper reports five subsequent experiments 
that have clarified the effects on evaluative judgments of a half-dozen 
parameters. These experiments have interacted with studies of calling- 
signal detectability and with development work to produce two dis- 
tinct realizable signals, which are scheduled for evaluation in a field 
trial. 


Il, EXPERIMENT 1 


2.1 Background 
The first experiment was identical in form to that reported by 
Bricker and Flanagan.* That is, listeners heard one signal at a time 
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and assigned each a number that reflected their opinion of the signal. 
This technique, in conjunction with a means of analyzing the data in 
terms of perceptual attributes, had provided considerable information 
about a limited variety of signals in the earlier study. The purpose of 
the present experiment was to obtain a rough idea of the parameters 
important to evaluation in a much larger domain of signals, to serve as 
a guide to more detailed investigations. 


2.2 Signals 

There were 100 different signals in this experiment of three basic 
types: 

(1) Thirty-four amplitude-modulated pulse-train-carrier (AMPC) 
signals, which were a subset of the signals studied by Bricker and 
Flanagan.* These represented selected combinations of four modulation 
frequencies, three duty factors, three carrier frequencies, and three 
harmonic compositions. 

(iz) Six amplitude-modulated sinusoidal-carrier (AMSC) signals, 
representing three modulation frequencies and two carrier frequencies. 

(iz) Sixty frequency-modulated sinusoidal-carrier (FMSC) sig- 
nals, representing selected combinations of six modulation frequencies, 
three carrier frequencies, three amounts of frequency deviation, and 
five modulation waveforms. 


2.3 Listeners 


Forty-three persons of various nonsupervisory employment classifi- 
cations at Bell Laboratories served as listeners. 


2.4 Procedure 


Groups of four to six listeners, seated around a table in a carpeted 
room with draperies, listened to one of four permutations of the 100 
signals reproduced over a high-quality magnetic tape playback sys- 
tem. They were instructed to record a positive number on the answer 
sheet for signals they liked and a negative number for signals they 
disliked; the greater the degree of liking or disliking, the larger the 
positive or negative number. A new signal occurred every 6 seconds, 
so that the entire procedure, including reading the instructions and rest 
periods, required about 15 minutes. 


2.5 Analysis 
The listeners’ ratings were arranged in a matrix of 48 rows (listen- 
ers) by 100 columns (signals) and each row was normalized so that 
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it had » = 0 and o = 1. These data were analyzed by the MDPREF 
computer program of J. J. V. Chang and J. D. Carroll® so as to pro- 
duce a spatial representation of both signals and listeners. MDPREF 
solutions represent the stimuli (calling signals, in this case) as points 
and the subjects (listeners) as vectors in multidimensional space in 
such a way that the projections of the points on each subject’s vector 
correspond maximally, in a least squares sense, to his input data 
vector. Another property of these solutions is that successive dimen- 
sions are orthogonal to those preceding and account for as much of the 
residual variance as possible. It is left to the experimenter to determine 
how many solution dimensions will be considered significant and how 
he will rotate the axes to render the solution interpretable. 

Another technique found useful in interpreting the results of this 
experiment was to regress the coordinates of the stimulus points on 
physical property vectors, so as to locate vectors maximally corre- 
sponding to the parameters that were varied to generate the stimuli. 
In the earlier experiment,? regression techniques had been used to 
find a three-dimensional structure in the data that was interpretable 
in terms of three parameters of signal design. 

Finally, 8. C. Johnson’s hierarchical clustering analysis* was applied 
to the data quite independently of the multidimensional scaling. This 
technique groups stimuli (signals) according to their mutual closeness 
in terms of a distance measure provided by the user. The inter- 
stimulus distance measure used for these data was defined as follows: 


di, = De (Ri; — Ri)’, 


where d;, is the squared distance between stimuli 7 and k and R;,; is the 
rating given to the jth stimulus by the ith subject. This measure hope- 
fully reflects the similarity of treatment of two signals by each listener. 
The computer-implemented Johnson technique produces a hierarchy 
of clustering of n objects, running all the way from n ‘‘clusters”’ of one 
object each to one cluster of n objects. It also computes a measure of 
compactness of the clusters at each level between these extremes. Using 
this measure, as defined by Johnson,’ we traced the compactness of 
various clusters as they grew in size to maximum compactness, in an 
effort to define types of signals. 


2.6 Results 
The first six dimensions of the MDPREF solution accounted for 28, 
22, 6, 4, 4, and 3 percent of the variance, respectively. The large drop 
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after the second dimension suggests that only two dimensions of the 
solution are interpretable. 

Of the parameters used for regression analysis, only modulation 
frequency (MF) could be located with sufficient confidence to identify 
it with a dimension of the solution. However, interpretation of the 
two-dimensional solution as a whole was greatly aided by the cluster 
analysis. Six clusters, including almost all the stimuli, were found to 
be at maximum compactness at about the same level in the hierarchy. 
Inspection of the membership of each cluster suggested a name for the 
type of signal comprising the cluster. Furthermore, when the stimulus 
points were projected on the plane of the first two MpPREF dimensions, 
a fairly simple closed curve could be drawn around each of the clusters 
without overlapping the others, This projection, along with the cluster 
contours and the subjects’ vectors, is shown in Fig. 1. The pulse-carrier 
signals are shown as open squares, the sine-carrier AM signals as 
shaded circles, and the sine-carrier FM signals as filled circles. Sub- 
jects’ vectors, normalized to unit length in two dimensions*, are 
shown as arrowheads pointing in the direction of higher evaluation. 
The vertical axis (Dimension I) corresponds to the MF vector, with 
low MF (5, 7, 10 Hz) at the bottom of the figure and high MF (20, 
40, 80 Hz) toward the top. 

The import of the results for calling-signal design is conveyed by 
consideration of the gross characteristics of the signal clusters. Al- 
though a more detailed examination of the results with respect to 
parameter levels reveals interesting information about tone perception, 
these findings are deemed inappropriate for present purposes. Com- 
plete details are available from the author.* 

The signals in both clusters below the horizontal axis of Fig. 1 
(“gliding pitch” and “trills”) are, with three exceptions, FM signals 
modulated at a slow enough rate that the pitch can be heard to change. 
In the case of “gliding pitch” signals, the pitch changes in a continuous 
manner, while the tone of the “trills” jumps from one pitch to another. 
The three exceptions are AMSC signals of which the single pitch 
alternates with silence. Note that no listener’s vector is located so as 
to indicate a clear preference for gliding pitch signals, whereas quite 
a few listeners evaluate trills higher than any other type of signal. 


* This normalization was adopted in the interest of presenting an uncluttered 
picture. A similar plot in which overall percentage of variance accounted for was 
represented by the distance of each arrowhead from the origin revealed no sys- 
tematic relation between orientation and length of vectors. 
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Fig. 1—Projections of 100 signals and 43 listeners’ vectors on Dimensions I and 
II of the MDPREF solution for Experiment 1. Arrowheads represent. listeners’ 
vectors, filled circles FMSC signals, shaded circles AMSC signals, and open 
squares AMPC signals. 


The one characteristic shared by all of the signals in the “favorites” 
cluster is an MF of 20 Hz. They also have low or medium carrier 
frequencies. This cluster is positioned so as to receive high ratings 
from many listeners who prefer 10- or 40-Hz MF somewhat more. Its 
name derives from the fact that it includes the signals with the three 
highest mean ratings and three others in the top ten. The single 
AMPC signal to reach this distinction was also first in the earlier 
study*; it appears again in Experiment 4. 

The four signals in the “high MF-low F,” cluster all have MF = 40 
or 80 Hz and a carrier frequency of 400 Hz. Some listeners clearly pre- 
fer these signals but not many other high-MF signals, to those with 
lower MF. 
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The name of the “high MF-high F,” group is self-explanatory 
(“high F” means carrier frequencies of 1,600 Hz or 800 Hz for AMSC 
and FMSC signals, and 900 or 700 Hz for AMPC signals). This group 
receives few high ratings. The last group (“low duty factor’’) is almost 
entirely AMPC signals with a low or medium duty factor, which 
renders harmonics of the modulating frequency prominent. No lis- 
tener’s vector is located so as to indicate a preference for these signals. 


2.7 Conclusions 

Some tentative principles of good calling-signal design suggested by 
these results are: 

(1) There seems to be an optimum modulation frequency around 
20 Hz, even though individuals vary widely with respect to this param- 
eter. Both the superiority of 20 Hz and the diversity of listeners were 
observed by Bricker and Flanagan.* 

(7) Pitch should change abruptly rather than gradually at low 
modulation frequencies. 

(222) Smooth amplitude modulation (high duty factor) is superior 
to abrupt amplitude modulation (low duty factor). 

(iv) Low carrier frequency and a narrow, low-centered spectrum 
are advantages, while high-frequency encrgy is a disadvantage, 
whether it arises from a high carrier frequency or the presence of 
harmonics of the carrier. 

The first two principles receive further support from subsequent 
experiments, the third is not investigated further, and the fourth is 
explored and refined in the next three experiments. 


II. EXPERIMENT 2 


3.1 Background 

The problem most in need of attention after Experiment 1 was the 
status of pulse-carricr signals. It is clear that most of the AMPC 
signals and the “gliding pitch” FMSC signals received low ratings, 
and that the two types are separated in the solution space (Fig. 1). 
However, they are separated chiefly on Dimension I, which corre- 
sponds to modulation frequency, and for very good reason: the AMPC 
signals in the study all had MF = 10 Hz, while the “gliding pitch” 
signals all had MF = 10 Hz. The two types project to similar points 
on Dimension JI, which is enigmatic: this dimension could reflect either 
a perceptual characteristic or merely the confounding in the experi- 
mental design. The purpose of Experiment 2, then, was to determine 
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whether AMPC signals required a dimension of their own to describe 
their perceptual relations with FMSC signals. The physical correlate of 
the dimension thought to be useful for this purpose is the bandwidth 
of the signals: AM szne-carrier signals have a narrower bandwidth 
than FMSC signals, which in turn are narrower than AMPC signals. 
Experiment 2 includes signals of all three types, each represented at 
the same values of MF so as to remove that source of confounding. 
Note that a narrow bandwidth was listed as a desirable characteristic 
in the fourth conclusion of Experiment 1, and that Experiment 2 is 
designed to provide additional information on this point. 

Since this experiment and those that follow use a novel method of 
collecting data, the procedure and some results it has produced will be 
briefly reviewed. The technique, called auditory sorting, has been 
described in the literature.® Briefly, it provides the listener with an 
array of movable pushbuttons, each of which evokes a distinctive 
sound. The listener arranges the buttons (sounds) in groups or in order 
according to instructions. In an early experiment with this technique, 
listeners were asked to group 24 three-parameter frequency-modulated 
tones according to similarity. Using an appropriate multidimensional 
scaling technique, it was possible to recover a perceptual space that 
closely resembled that recovered from the much more laborious pair- 
comparison procedure in a companion experiment.?° In another experi- 
ment, listeners ordered a subset of the tones used in Experiment 1 
according to preference. From these data, MDPREF constructed a space 
very much like that based on the rating data for the same subset. The 
strategy in this experiment, then, was to include enough FMSC signals 
to recover a three-dimensional perceptual space and then observe 
whether the AMPC signals, the AMSC signals, or both required an 
additional dimension to account for their evaluations. 


3.2 Signals 

Twenty-four signals, all of which were derived from an 800-Hz 
carrier frequency, were used in this experiment. They were of three 
types: FMSC (n = 18), AMSC (n = 8), and AMPC (n = 8). Each 
type was represented at the same three modulation frequencies: 10 Hz, 
20 Hz, and 40 Hz. There were six FMSC signals at each MF, realizing 
all combinations of two FM waveforms (sinusoidal and rectangular) 
and three amounts of frequency deviation (+8, £10, and +25 per- 
cent). The AM signals were modulated with a symmetrical sinusoidal 
envelope, and the PC signals employed a carrier with approximately 
equal-amplitude components at 800, 1,600, and 2,400 Hz. 
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3.3 Listeners 


Thirty Bell Laboratories employees, 17 male and 13 female, served 
as listeners in this experiment. 


3.4 Procedure 


The listener was seated in a small sound-attenuating booth with 
the sorting apparatus in front of him and a loudspeaker on the wall 
above it. He was shown how each button evoked a different sound 
(signal) from the speaker, and instructed to arrange the signals 
(buttons) from right to left according to “how much [he] would like 
each of these tones if it replaced the telephone bell.” Listeners were 
permitted to produce partial orderings, i.e., to arrange the signals in 
groups such that the evaluative ordering obtained between groups, 
but signals within a group were tied. 


3.5 Analysis 

The vectors were each normalized and arranged in a 30 (listeners) 
by 24 (signals) matrix to serve as input for MpPREF. Regression tech- 
niques were used to find, in the resulting space, three orthogonal direc- 
tions that best corresponded to the three parameters of the FMSC sig- 
nals. A fourth dimension was located so as to satisfy three conditions: 
(2) mutual orthogonality to the first three, (27) maximization of residual 
variance accounted for, and (27) close nonlinear correspondence to a 
property defined by three levels of spectral width—narrow (AMSC), 
medium (FM), and wide (AMPC). 


3.6 Results 


In the figures that present the results, plotting symbols have been 
used that suggest the parameter values of the signals they represent. 
Thus, large symbols are used for 25-percent frequency modulation, 
medium for 10-percent, and small for 3-percent; round symbols for 
sinusoidal FM, square for rectangular FM. Modulation frequency in 
Hz is given by arabic numerals. For the AM signals, sinusoidal and 
pulse carriers are distinguished by symbols representing one cycle of 
their respective waveforms. 

The projections of all 24 signals on the plane of the first two solution 
dimensions are shown in Fig. 2a, and their projections on the plane of 
the third and fourth dimensions are shown in Fig. 2b. For the 18 FMSC 
signals, the first three dimensions represent modulation frequency, 
modulation percentage (MP), and modulation waveform (WF), re- 
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Fig. 2—Projections of 24 signals on the MDPREF solution dimensions for 
Experiment 2: (a) Dimensions I and II; (b) Dimensions III and IV. 
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spectively. This configuration resembles that obtained from the afore- 
mentioned similarity-judgment experiments’ in sufficient detail to 
support the conclusion that the present experiment reveals dimensions 
of perceptual significance. 

Note that the AM signals fall in appropriate places on Dimension I 
(MF) and that they are just beyond the 3-percent FM signals on 
Dimension II (MP); this latter location is consistent with their having 
O percent frequency deviation. The chief function of Dimension IV is 
to separate the AM signals, with the AMPC signals to the left and the 
AMSC signals to the right. This dimension accounts for 10 percent of 
the variance, which compares favorably with 28, 13, and 16 percent 
for the first three, respectively. 

The listeners’ vectors are shown in Fig. 3a and b. Whereas Fig. 3a 
shows the usual diversity of opinion with respect to MF (and MP as 
well), Fig. 3b shows a considerable concentration of vectors so as to 
reflect low ratings for both AMPC and low-rate sinusoidal FM signals. 
It is not clear from these figures whether evaluation continues to im- 
prove as bandwidth is reduced. Subjects are in fact evenly divided as 
to whether their highest-ranked AMSC signal is ranked above their 
highest-ranked FMSC signal, and mean normalized rank is slightly 
higher for FM. Thus, although bandwidth is established as a percep- 
tually significant parameter, listener evaluation is not a monotonic 
function of it. 


3.7 Conclusion 

A reasonable accounting of the data demands that AM signals be 
regarded as differing from the (three-dimensional) FM signals on a 
fourth dimension. Although signal bandwidth provides a satisfactory 
interpretation of this dimension, its relation to listener evaluation is 
other than monotonic. 


IV. EXPERIMENT 3 


4.1 Background 

While the experiments to this point had considerably clarified the 
nature of the perceptual space, they had not explored all combinations 
of parameter values. In particular, Experiment 2 suffers from con- 
founding of carrier type with modulation type (FM or AM), even 
though it served to un-confound these parameters with modulation 
frequency. Thus we are left with the formal possibility that Dimension 
IV of Experiment 2 could be interpreted as “modulation type” rather 
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Fig. 3—Locations of listeners’ vectors with respect to the dimensions shown in 
Fig. 2: (a) Dimensions I and II; (b) Dimensions III and IV. 
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than “bandwidth,” and with the practical possibility that pulse-carrier 
signals might receive higher ratings if frequency modulated. Experi- 
ment 3 allows for complete factorial arrangement of both types of 
carrier with both types of modulation. 


4.2 Signals 

All possible combinations of three modulation frequencies (10, 20, 
and 40 Hz), two modulation types (AM and FM), and two carrier 
types (sinusoidal and pulse) were generated, for a total of twelve 
signals. The carrier frequency was 800 Hz, and the FM modulating 
waveform was rectangular with +£10-percent deviation, The pulse 
carrier contained the first three harmonics of the carrier and the AM 
parameters were as in Experiment 2. 


4.3 Listeners 


Forty-four Bell Laboratories employees served as listeners. They 
were each selected with approximately equal frequency from two cate- 
gories each of age (over or under 30) and sex. In addition, each listener 
submitted to an audiometric screening test; none was found to exhibit 
a clinically significant loss in the range of the signals under test. 


4.4 Procedure 


Each listener produced an evaluative ordering or partial ordering 
of the 12 signals, using the sorting apparatus under the same instruc- 
tions as in Experiment 2. At the termination of the ranking procedure, 
each listener was asked to state whether he liked “‘the bell on his home 
telephone” more than none, some, or all of the tones, and to locate it 
in the hierarchy if “some”. 


4.5 Analysis 


The objective in this and subsequent experiments was more to 
identify optimum parameter combinations than to discover perceptual 
dimensions. Consequently, the results are presented in terms of con- 
ventional statistical summaries and tests of significance rather than 
multidimensional analysis. The basic datum is the rank assigned each 
of the 12 signals by each of the 44 listeners. 


4.6 Results 

The median rank assigned each of the 12 signals is shown in Table 
I, where lower numbers indicate higher evaluations. It is immediately 
apparent that sine carrier ranks higher than pulse carrier in each of the 
six comparisons with the other factors held constant. Neither of the 
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Taste I—Mepian Rank AssiGNED EAcH oF TWELVE SIGNALS, 
EXPERIMENT 3 























Modulation Type FM AM 
Carrier Type Sine Pulse Sine Pulse 
Modulation 10 4.5 5.6 5.5 7.5 
Frequency 20 2.8 6.9 3.8 6.9 
(Hz) 40 5.8 9.4 4.7 9.3 








other parameters shows quite so consistent an effect. An analysis of 
variance showed all three parameters to have statistically significant 
effects, with carrier type the largest and modulation type the smallest. 
Again 20 Hz is the highest-ranking modulation frequency; the rank 
distributions for most of the 10-Hz and 40-Hz signals reflect the diver- 
sity of opinion about MF observed in the earlier experiments, in that 
they are generally bimodal or broadly dispersed. Overall, FM is some- 
what higher ranked than AM. Another finding of the analysis of vari- 
ance was that there was no systematic difference among the four 
age-sex groups in their patterns of evaluation. 

Table II shows, for each signal, the number of listeners who ranked 
it above or equal to their remembered concept of the bell. The pattern 
of preferences here is much the same as that shown by median rank 
(in fact, it would be statistically the same if there were no correlation 
between electronic signal evaluation and “bell” evaluation). The main 
value of this measure is to give the signal ratings some external refer- 
ence, however tenuous. 


4.7 Conclusion 


The data show clearly that at least for a carrier frequency of 800 
Hz, a signal containing three harmonics of the carrier is less well liked 
than its single-frequency counterpart, regardless of whether amplitude 
or frequency modulation is employed. Furthermore, there is no practi- 


TaBLE II—NuMBER oF SuBJEcTs (ouT oF 44) RANKING EacH 
SigNAL ABOVE OR EQUAL TO THE “BELL’’, EXPERIMENT 3 




















Modulation Type FM AM 
Carrier Type Sine Pulse Sine Pulse 
Modulation 10 ll 12 17 7 
Frequency 20 23 11 19 5 
(Hz) 40 14 5 12 4 
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cal advantage to frequency-modulating a pulse-carrier signal, although 
FMSC signals are again slightly superior to AMSC signals. 

Discovery of a way to improve evaluations of pulse-carrier signals 
would be valuable, because such broadband signals have a well-estab- 
lished advantage in detectability. A search for such a means gave rise 
to the next experiment which although brief and unavailing was in- 
formative in other respects. 


V. EXPERIMENT 4 


5.1 Background 


The best-liked FM signal (20-Hz, 10-percent rectangular FM) has a 
musical aspect worth noting: the two alternating frequencies (720 Hz 
and 880 Hz) stand in a relation close to a major third, generally 
thought to be a pleasing musical interval. The pulse-carrier signals 
used so far also have a musical aspect: the three components (e.g., 
800, 1,600, and 2,400 Hz) establish two intervals—an octave and a 
major fifth. While not dissonant, the fifth is generally considered 
harsher and less pleasing than the third. But certain members of the 
harmonic series other than the first three can be chosen so as to gen- 
erate thirds (fourth and fifth harmonic) and pleasing inverted (or 
open) triads. This experiment was an attempt to improve the rating of 
pulse-carrier signals by selecting such combinations. 


5.2 Signals 


Each of the eight signals in this experiment had three components 
whose relative amplitudes decreased at a rate of 3 dB per octave from 
500 to 4,000 Hz. All signals were amplitude modulated as before at 20 
Hz. The frequencies of all the components in kHz are shown in Table 
III. The more important musical aspects of this set are: (2) Signals 
1, 2, 3, and 6 are compact, in that adjacent harmonies of the respective 
fundamentals are selected. Of the compact signals, only number 3 
represents a major triad (second inversion) ; (27) Signals 4, 5, 7, and 8 
are open, in that certain harmonics are suppressed in between those 
that are passed. Each of the open tones constitutes a major triad, in- 
verted and opened to span more than an octave. 


5.3 Listeners 


Bell Laboratories employees were selected to represent wide varia- 
tion in musical skill and training, and to have normal hearing over 
the range of component frequencies. The experiment was terminated 
after six listeners had been run, 


TELEPHONE CALLING SIGNALS 1573 


TasBLe ITI—FREQUENCIES IN KHz oF CoMPoNENTS OF EIGHT 
SIGNALS, EXPERIMENT 4 











Signal No. 


First Component 
Second Component 
Third Component 






5.4 Procedure 


Listeners were asked to use the sorting apparatus to arrange the 
eight signals in evaluative order, as before. 


5.5 Results 

Regardless of the musical background, no subject ranked any of the 
five triad-producing signals first. The signal most often ranked first 
was number 1, which had the lowest frequencies for all three compo- 
nents. Number 6 was the only other signal ranked above the median by 
all listeners. Listeners described the low-ranked “musical” signals as 
“high-pitched,” “tinny,” and ‘“jarring.’”’ One listener who was sophisti- 
cated in both music and acoustics recognized the differences in musical- 
ness, but averred that the high-frequency components were irritating, 
even though they served to complete a chord. Since even those listeners 
expected to be most favorably disposed toward chord-signals rejected 
them in favor of signals with low component frequencies, the experi- 
ment was terminated after only six listeners. 

In addition to rejecting the notion that musicalness of component 
intervals might “rescue” broad-spectrum signals, this experiment 
affords certain informative comparisons with earlier studies. For ex- 
ample, the first three signals have the same bandwidth, but differ in 
location of their spectra. Rankings plummet from first through average 
to last as frequency of components increases across this set. Signal 
number 6 has a greater bandwidth than any of these and a highest 
component between those of signals 2 and 3, yet its overall rank was 
equivalent to that of signal number 1. These facts suggest that listener 
evaluation depends in a complex way on the frequencies of the compo- 
nents, so that both the average frequency and the highest frequency 
can be determining. Bandwidth per se, it seems, is much less important. 
Searching through the details of Experiment 1 produces support for 
this notion, and further suggests that the region between 2,000 and 
2,500 Hz is critical for upper component, 1,500 to 2,000 Hz for average. 
One would interpret the poor showing of pulse-carrier signals in Ex- 
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periments 2 and 8 in retrospect as an invasion of a critical frequency 
region by the highest component, rather than as an effect of bandwidth 
or the mere presence of harmonics. 

It happens that signal number 2 in Experiment 4 has exactly the 
same specifications as the lone AMPC signal to join the “favorites” 
cluster in Experiment 1 (Fig. 1). Since signal number 2 was outranked 
in Experiment 4 by signal number 6, even though the latter has a 
component above 2,000 Hz, we might expect signal number 6 to com- 
pete well with sine-carrier signals. However, the most similar signal in 
Experiment 3 (AMPC, 20 Hz, components at 800, 1,600 and 2,400 Hz) 
was considered equal to or better than “the bell” by only 5 out of the 
44 listeners, compared to 23 out of 44 for the best FMSC signal (see 
Table II). Although it is possible that three harmonics of 750 Hz 
(especially with 3 dB per octave attenuation) could be much more 
pleasant than three harmonics of 800 Hz, it is safer, in the absence of 
a complete map of component-frequency effects, to take these Ex- 
periment 3 findings as a guide to how signal number 6 would fare 
against FMSC signals. The reason for the attention given here to 
signal number 6 is a practical one: its acoustic specifications are 
exactly the same as those of a ringer now under development." This 
ringer has been shown” to be satisfactorily detectable in typical room — 
noise, and to be superior in this respect to a narrow-spectrum FMSC 
signal. The present experiments suggest that while a three-harmonic 
750-Hz signal is a good one of its type, it is likely to be less well liked 
by listeners than a good FMSC signal. 

Since the laboratory affords no way to equate pleasantness and 
detectability, evaluation of both leading signals under operating condi- 
tions seemed an appropriate means of resolving the conflict between 
the criteria. To make this evaluation possible, the Telephone Labora- 
tory at Indianapolis modified its basic design so that it could generate 

.an FM signal. Questions that arose in the course of this redesigning 
effort prompted the next and last experiment. 


VI. EXPERIMENT 5 


6.1 Background 


The ringer consists mainly of an electromagnetic transducer and a 
resonant cavity. The resonator must be small enough to fit inside a 
telephone station set and large enough to resonate the lowest frequency 
component of the desired signal. The higher the carrier frequency, the 
smaller the ringer could be, so the listener-evaluation function of 
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carrier frequency (F,, or average) is important design information 
Experiment 1 had indicated that signals with F, = 1,600 Hz were not 
as well liked as those with F, = 800 Hz, but there was also a sugges- 
tion that evaluation was not monotonic with F,. In any event, the 
parameter F, had not been explored in sufficiently small steps to guide 
the design of a ringer for narrow-band signals. 

Another purpose of Experiment 5 was to assess the effects of super- 
imposed amplitude modulation on listener evaluation. This question 
arose because such amplitude modulation was found to be technically 
difficult to eliminate from the design under consideration. 


6.2 Signals 

There were eight FMSC signals involved in this experiment. Each 
was rectangularly frequency-modulated at 20 Hz, with the upper and 
lower components standing in the ratio of 5 to 4 in frequency—a major 
third. There were four values of F, , as shown in Table IV, along with 
the upper and lower component frequencies. There were two signals 
at.each F,, one of which was pure FM, the other of which was ampli- 
tude modulated at 20 Hz in such a way as to imitate the effect found 
in the practical design. 


6.3 Listeners 
Thirty listeners, 10 male and 20 female, were recruited from among 
Bell Laboratories’ clerical, shop, and technical employees. 


6.4 Procedure 
Each listener used the auditory sorting apparatus to rank or par- 
tially rank the eight signals, as in Experiments 3 and 4. 


6.5 Results 
The number of listeners who assigned each rank to each of the eight 
signals is shown in Fig. 4. This method of presenting the data is resorted 


TasLE [V—IFREQUENCIES IN Hz oF F, AND BotH COMPONENTS 
OF SIGNALS TESTED IN EXPERIMENT 5 





Fo Upper Component Lower Component 
1,350 1,500 1,200 
1,125 1,250 1,000 


900 1,000 800 
675 750 600 
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to here because the extreme biomodality of the ranking for Fy = 675 
Hz renders any measure of central tendency misleading. For practical 
purposes, one would wish to avoid a signal about which listener opinion 
is so divided. There is more of a consensus on low ranks for the signals 
with F, = 1,350 Hz, and there is little to choose among the signals with 
F, = 1,125 or 900 Hz. 

Detailed analyses confirm the impression given by the figure that 
there is little difference between signals with or without superimposed 
AM. A tally was made of the outcome of each of the four same-carrier- 
frequency comparisons for each listener, the possible outcomes being 
FM > AM, AM > FM, and FM = AM. The results were that AM 
made essentially no difference to 11 listeners, while 6 listeners preferred 
FM only three or four out of four times, and 5 listeners preferred AM 
to FM only three or four times. 

The results of this experiment are viewed as supporting two engi- 
neering decisions taken subsequently, rather than as evidence for more 
general conclusions. The first was to develop a narrow-band FM ringer 
with F, = 1,012.5 Hz, which is midway between the two generally 
acceptable F, in the experiment. The second was not to attempt to 
eliminate the superimposed AM, in the light of the indifference to it 
apparent in the experiment. 


Fo = 1350 1125 


< 


° 
Z 
z 
< 


NUMBER OF SUBJECTS 





+AM 








RANK 


Fig. 4—Histograms of number of listeners assigning each of eight ranks to 
each of eight signals, Experiment 5; N = 30 for each histogram. 
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VII. SUMMARY OF CONCLUSIONS 


Results of these experiments permit the specification of two ringers 
—one narrow-band and one broad-band—that are promising with 
regard to listener evaluation. The narrow-band signal is frequency 
modulated at 20 Hz, with its upper and lower components tuned to a 
frequency ratio of 5 to 4, averaging around 1 kHz. The broad-band 
signal is amplitude modulated at 20 Hz, and has three roughly equal- 
amplitude components at 750, 1,500, and 2,250 Hz. The evidence indi- 
cates that the narrow-band signal will be preferred to the broad-band 
signal by a majority of listeners when the two are in direct. contest, 
and that the broad-band signal is more detectable in typical room 
noise when the two are equated for power. 

Laboratory studies do not reveal, however, how listeners will eval- 
uate either signal after some experience with it in actual use, Neither 
do they tell us how effective these signals will be in practice. A field 
trial involving residential subscribers is planned to gather information 
on these points. This trial also makes it possible for subscribers to tell 
us something about the tradeoff between the pleasantness advantage 
of the narrow-band signal and the detectability advantage of the 
broad-band signal by adjusting their volume controls. Informal experi- 
ments have indicated that listeners attenuate an unpleasant signal, 
when given a volume control in the laboratory. If subscribers behave 
similarly in the field, we shall have as one of the data the amount by 
which they offset the detectability advantage of the less pleasant sig- 
nal. In any event, we shall collect data on answering times, no-answer 
rates, and opinions-after-experience. As useful as the laboratory 
studies have been, they could not have provided information on these 
crucial points. 

The field trial will also provide an opportunity for a direct compari- 
son of these two tone ringers with a widely used gong (C4). There 
are so many differences between tone ringers and gong ringers, ranging 
from the obvious difference in excitation to the factor of familiarity, 
that interpretation of this aspect of the study will be difficult. Never- 
theless, these results may require modification of some of the principles 
(e.g., those pertaining to bandwidth) derived from this series of ex- 
periments on tone ringers alone. 
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On a Class of Rearrangeable Switching 
Networks 


Part I: Control Algorithm 


By D. C. OPFERMAN and N. T. TSAO-WU 
(Manuscript received December 1, 1970) 


An algorithm is developed to control a class of rearrangeable switch- 
ing networks, particularly with the base-2 structure. Various methods 
of implementing this algorithm are also described. System organiza- 
tion and processing time for rearranging the network are studied and 
are shown to be practical. 


I. INTRODUCTION 


One type of a switching network which has drawn considerable in- 
terest lately is the class of rearrangeable switching networks (RSN). 
With these networks, any idle input terminal of the network can al- 
ways be connected to any idle output terminal by rerouting the ex- 
isting connections if necessary. These networks can be used where one- 
to-one full access and nonblocking features are required, and rerouting 
is feasible, e.g., main distribution frames? and facility switches? in 
telephone systems and data transfer networks in a multiprocessor 
computer system.° 

Most of the earlier efforts, notably by C. Clos,* V. E. Benes,> and 
A. E. Joel, Jr.,° have been made in the context of telephone switching 
networks. Their emphasis has been on the network structure, on its 
combinatorial properties and on bounds on the number of connections 
that require rerouting. Recently, this type of network has been of 
interest in such computer areas as data-sorting systems’ and self- 
repairing multiprocessors.* The network structure is also applicable for 
cellular arrays.* However, very few reports®?° have been made on the 
control aspect of these networks. 

This paper will begin with a brief discussion of the general struc- 
ture of RSN’s, followed by the development of a method for the con- 
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trol of these networks and its practical implementation. The relation- 
ship between the network structure and the ease (or difficulty) with 
which it can be controlled will also be discussed. 


II. THE NETWORK STRUCTURE 


Discussion will be limited to a class of rearrangeable switching net- 
works connecting N input terminals and N output terminals [abbre- 
viated as (NV X N) networks]. Extension to the more general case for 
(N X M) networks, N ~ M, can be readily made. 

Let N = dq, where d and q are integer factors of N. A (N x N) 
network can be decomposed into an input stage and an output stage 
having altogether 2N/d (d xX d) networks (one of which can be elimi- 
nated) and a middle stage having d (N/d x N/d) networks, as shown 
in Fig. 1. This network is said to have a base-d structure, and the 
smaller networks are called subnetworks. This type of network struc- 
ture falls into the general class considered by Clos and Bene’. 

The network with base-2 structure is of great importance, for two 
primary reasons. First, it yields the most efficient network (or the 
least number of two-state switching elements) and, secondly, its con- 
trol is relatively simple. It consists of (2 < 2) networks in the input 
and output stages, altogether (V — 1) in number where JN is even or 
odd. Two (N/2 X N/2) networks are in the middle stage if N is even, 
or one [(N — 1)/2 x (N — 1)/2] network and one [(N + 1)/2 x (V+ 
1) /2] network are in the middle stage if N is odd, as shown in Fig. 2a 
and b. Clearly, further decomposition of the networks in the middle 
stage is possible, and if the base-2 structure is carried throughout, one 
may show by an iterative process that the total number of basic switch- 


N 


d 





Fig. 1—Rearrangeable (N x N) network of a general base-d structure. 
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NETWORK 





(b) 
Fig. 2—An (N X N) network with base-2 structure for: (a) N even; (b) N odd. 


ing elements or (2 x 2) networks, named f-elements by Joel,® is given 
by* 
Nilog, N) — 2°°8”? +. 1, 

It can easily be seen that the number of these B-elements is bounded 
by (logeN!). A (11 X 11) network consisting entirely of B-elements is 
shown in Fig. 3. It is a very efficient network, since there are 29 such 
elements and one must have 26 (>log2(11!) >25) two-state devices to 
accommodate all possible permutations. Some of the enumeration 
studies given in Part II will account for the additional states. 


III. THE NETWORK CONTROL 


The control algorithm is first developed for the general (NV x N) 
network, having a base-d structure, as it is shown in Fig. 1. The 
special case where d = 2 will then be considered, and practical imple- 
mentations of the control algorithm will be given in Section IV. First, 
some definitions are needed. 


* (2) is the smallest integer greater than z. 
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Fig. 3—An (11 x 11) network with base-2 structure. 


3.1 Definitions 


(z) An input-output pair ‘ea defines a connection between input 
terminal x and output terminal w(x) of a (N X N) network, where 
lszx~sN,1s7r(a) SN 

(iz) A connection set C is a collection of m such input-output pairs, 
m = N, expressed as follows: 


C= | Ly Le a | : 
m(X,) w(t) +++ T(%n) 
If m = N, C is then in the form of the familiar permutation. This may 
be denoted by P, with C C P, describing all the connections (or the 
traffic pattern) through the (NV x N) network. 
(wr) Let IJ = {1, 2, --- , N} be a set of positive distinct integers ; 
a subset J(l, d) is defined by - 


10, = {a| [2+4=4]* =, snared 


where d divides N, and | is some constant integer, 1 S/S N/d. 
J(l, d) is called an integer set of order d and of characteristic 1, and 
any element belonging to it is said to have characteristic / relative to 
the base-d. This is merely a formal way of grouping all those termi- 
nals associated with the same subnetwork in the input (or output) 
stage. . 

(iv) A connection set C having m input-output pairs is said to be 


* [w] denotes the integral value of w. 


REARRANGEABLE SWITCHING NETWORKS I 1583 


reducible if and only if on replacing every integer by its characteristic, 
C becomes a permutation on m distinct integers. 


3.2 The Generalized Control Algorithm 
Any given set of connections through the network can be described 


as: 
w(t) m(t2) +++ (ty) 

The objective of the control algorithm is to derive from P permutations 

to be realized by each of the subnetworks. This is accomplished by 

first decomposing P into reducible connection sets Cy, Co, ++: ,.Cq. 

Permutations for the middle subnetworks are defined by: using the 


characteristics of the elements in these sets, and permutations for the 
input and output stages are determined directly from these elements. 


3.2.1 To Decompose a Permutation into Reducible Connection Sets 


Let the output terminals be partitioned into sets denoted by S, where 


$= (re)|nEIa}, 151%. 
The reducible connection sets C;, 1 1 = d, are constructed by group- 
ing N/d input-output pairs of which the output integers are selected, 
one from each S,, such that no two output integers have the same 
characteristic. 
Let C; be expressed as 


C= 


IIA 


asd; 





1 
Ci Xi ,2 : 2. Li,N/d | 1 
? 
a(x; ,1) 1(X;,2) oe W(Xi,wa) 


when it is reduced, one has the permutation 


a ( Pi Pi,2 <= Di,N/a 
T(pi) m(D:,2) ™(D:,7a) 
where p;,; and w(p;,;) are the characteristics of the integers x,;,; and 
mw(x;,;), respectively, 1 S72 S$ d,1 Sj S N/d. Each of these d permuta- 
tions P; can be realized by any of the d (N/d X N/d) subnetworks. 
However, if each C; is assigned to a particular (V/d X N/d) subnetwork, 


one of the (d X d) subnetworks, say the one at the upper-left corner 
of the input stage (Fig. 1), can be eliminated. This elimination implies 
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that the connection set C;, and hence P;, is assigned to the first 
(N/d X N/d) subnetwork in the middle stage if and only if z;,; = 1 
for some j, 1 S j S$ N/d. In general, C; , and hence P; , is assigned to 
the kt (N/d X N/d) subnetwork in the middle stage, 1 < k < d, 
(counting from the top) if and only if z;,; = kforsomej,1 <j S N/d. 
One may then reorder the indices 7 such that C; contains the input- 
output pair (ear t) , and it follows that the permutations P,,P.,---, Pa. 
are similarly ordered for the d (N/d X N/d) subnetworks in the middle 
stage. 


3.2.2 To Obtain Permutations for Subnetworks in the Input and Output 


Stages 
Let Pr, Pre, cake i Praga 0nd Peas Pass ieee , Po.n,a be the sets 
of permutations to be realized by the 1**, 224, --- , (N/d)* (d X d) sub- 


networks in the input and output stage, respectively. Moreover, let 
the reducible connection set C; be written as 


x. Xv. eee X. , 

C, = | a,1 7,2 1,N/d | . 1 < 1 < d 
m(X5,1) 1 (2;,2) oes 1(x;,wva) 

such that 21 < t2°°' < °° < Uwe. 


Then, from the network structure, one can see simply that the 
permutations 


a ae my'(2) + ) 
1 Di: tts. 


P,, -( 1 2 ba, a 
| m(1) (2) +++ (a) 


can be obtained from C; with 


a; '(a) = 2; — G — Dd, 1<js N/d, lsacdzd 


and 


where 7 ‘(a) denotes the input terminal to be connected to the output 
terminal a, and 7,(a) = m(a%,,) — (k — 1)d, 1S k S N/d, for some 
t such that 7(z,,,) € J(k, d). 


3.2.3 An Example on Decomposing a Permutation 

Consider a (15 X 15) network, having a base d = 5 structure. Such 
a network is shown in Fig. 4, with two (5 X 5) subnetworks in the 
input stage, three (5 xX 5) subnetworks in the output stage, and five 
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(Ee | 
| Gis) ae : 
; fT <Fi 342 > 
INTRO 
DAL OS 
R08 (23) ase a 
; Va pes A : 


OUT SELK 
KE 
i OG Nis : 


MIDDLE STAGE 


Baw Bo ON aise ee) 
is 10 7 5 13 14 


146159121238 
Fig. 4—A (15 & 15) network with permutations assignment. 


(3 X 3) subnetworks in the middle stage. Let the connections through 
the network be described by the following permutation: 


aa See eee ee) 
11154261758 91214 3 13 10 


The output integers are partitioned as follows: 
S, = {11, 15, 4, 2, 6}, 
S. = {1, 7, 5, 8, 9}, 
S; = {12, 14, 3, 13, 10}. 


From these, the reducible connection sets C, , C, , C; , C, , and C; 
are constructed as shown in Table I and, in general, they are not unique. 
The corresponding permutations (P;), which are also shown in Table I, 
are obtained from C; by replacing each element with its characteristic. 
Note that connection sets are ordered according to the input integers 
(1, 2, 3, 4, 5) and that x;,. < x;,2 < 2;,; for each 7. Table II shows the 
permutations derived from C; for subnetworks in the input and output 
stages using the relations for 7;*(a) and 7,(a) as given in Section 3.2.2. 
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TABLE J—REDUCED CONNECTION SETS AND THEIR CORRESPONDING 











PERMUTATIONS 
t C; P; 
- -* 1 8 15 12 8 
; F 5 ; (, 1 ,) 
et 2 9 13 12 8 
: \ 8 i (, 2 i) 
| 8 10; | 2 8 
: B 9 a (, 2 aa). 
a é 7 “e ' 2 ‘too. 
4 

2 7 14 2 3 
—, 5 66 «(14) 2 742 By 
: 1 4 ‘ 1 _ 


Taste JJ-—-PERMUTATIONS FOR SUBNETWORKS 





Ll 


12 3 





5 3 1 


OuTPUT STAGES 


Py; 


12 3 
3.4 
1 2 


anil wo wot wo bv 


1 2 


ew) 


IN THE INPUT AND 


x 


non wl k WC] eF Ww 
Ee PEND BILD 


IV. THE CONTROL ALGORITHM FOR THE BASE-2 NETWORK 


The selection of output integers from the set S,, 1 


<1< N/d, to 


form the connection sets C;, 1 S 7 S d, is by no means simple. One 
procedure has been reported by V. I. Neiman.?? By modifying the 
previous example, it can be shown that the selection can not be made 
on a strictly sequential basis. Let the given permutation be modified 


such that the sets S, , 


S., and Sz are as follows: 
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S, = {11, 10, 4, 2, 6}, 


S, {1, 7, 5, 8, 9}, 
S, = {12, 14, 3, 18, 15}. 


If one had chosen 4) and (°) input-output pairs to form C, , where 
the output integers 11 and 1 are from S, and S, respectively, there 
is no output integer in S; which would have a characteristic different 
from that of the integer 11 or 1. Thus, in general, simultaneous selections 
must be made in the construction of the connection sets. For networks 
with base-2 structure, however, the selection is reduced to a mere 
binary choice, resulting in a much simpler algorithm. For the other 
extreme case,"’ where d = N/2, the difficulty described above will 
not arise because there are only two sets S,; and S, , each containing 
exactly d integers. 

For a (N X N) network with a base-2 structure, the control 
algorithm for setting the B-elements to realize a given permutation P 
consists of three parts: (7) decomposing of P into reducible connection 
sets Cy and Cy; (1) reducing C, and Cy to P; and Ps respectively; 
and (i) setting the B-elements in the input and output stages. Since 
the network has an iterative structure, the same procedure is applied 
to each of the (N/2 x N/2), (N/4 x N/4), --: , (2 X 2) subnet- 
works. There are logsN levels of an (N X N) network with the last 
level of (2 X 2) subnetworks being a trivial case, assuming N is a 
power of 2. 

A coding scheme for the input and output integers which facilitates 
the required operations and two methods for decomposing P are 
described in the following sections. 


4.1 Coding Scheme 

It is clear from what has been discussed so far that the control 
algorithm essentially accepts the connection requirements as input 
data and, after processing, generates a set of output data which are 
used to rearrange the network. It is necessary, therefore, to have an 
input/output (I/O) memory, which stores the output terminal w(2;) at 
the address determined by the numerical value of the input terminal 7; . 
A simple coding scheme for these integers proves to simplify the 
implementation of the algorithm. 

Referring again to the network, one can, of course, use the set of 
integers (0, 1, 2, --- , N — 1) to number the input and output ter- 
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minals, without loss of generality in all the previous discussions.* 
Then, the familiar binary code can be used directly, both for the 
input x; as address and for the output 7(z;) as the contents at x;. We 
shall now show how this code can be used at all logoN levels. Let the 
binary representation be 


b,-1Dn-2 ‘3 bibo ? 


where n = logo, and assuming N a power of 2. Beginning at the first 
level of the network, the least significant bit of the address is used to 
set the input B-element defined by the remaining n — 1 bits, and that 
of the contents to set the output B-element defined by the remain- 
ing n — 1 bits of the contents. Moreover, the conversion from C, (or 
C2) to P, (or Pe) is accomplished by merely eliminating bo for each 
coded output integer. Finally, for an output integer w(z;), the bit bo is 
set to identify whether the particular +(2z;) belongs to P; or Ps. How- 
ever, for an input integer, the most significant bit of the address, bn; , 
indicates whether x; belongs to P; or P.. 

The same coding procedure is applied at each subnetwork, and the 
I/O memory is partitioned (part of algorithm) in the appropriate 
manner. In general, at the 2 level of the network, z < logeN, the ( — 1) 
least significant bits of a word in memory define the particular sub- 
network of size N(2'*), and the logeV — (1 — 1) most significant bits 
define the output integer. The (?—1) most significant bits, however, of 
the address designate the subnetwork, and the remaining bits define 
the input integer. An example will be given in detail to illustrate 
this in Section 4.2.1. 


4.2 Decomposition by Looping 

With d = 2, an integer set is reduced to an integer pair, consisting 
of only two elements, and one is said to be the dual of the other. If one 
continues to use the integers (0, 1, --- , N — 1) to number the input 
and output terminals of an (NV X N) network, then the integers a and 
b constitute an integer pair if 


s}- Ll" 


for some integer 1. The dual of a (or b) is denoted by 4 (or 6), and, 


* Except that the definition of the integer set J(/, d) needs to be slightly modi- 
fied, as follows: 


J(l, d) = {alla/d) = 1}, O18 (N/d) —-1 
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therefore, 
a@=6 and— b =a. 


Moreover, any permutation is decomposed into only two reducible 
connection sets C', and C, , and if (,7,,} € C1, then (,*,,} and (,7/,.) € 
C,, for some x; , where 7(z;) is the dual of x(x). In coded form, the dual 
is obtained merely by complementing the least significant bit. 

One method, called the looping procedure, of constructing C, and C, 
from P is to search the I/O memory for the required outputs. The 
sequence starts by selecting the output 7(0) in the first location of 
the memory (or aby The first (n — 1) bits (a,_1a,-2--+ a;) of the 
address which are all zeros define the first B-element (8;:) in the input 
stage, and the last bit (a,) which is also zero defines B;, to be set to the 
“straight through” state (see Fig. 5). The bits m,_ym,-2 --- m, and 
M, at this address define one of the 8-elements (@,;) in the output stage 
and its setting respectively. Bit mp» is now reset to zero to designate 
that this particular output (m,_ym,-2 ++ ™,) is for P, . There is also 
another bit m, which is set to one when that particular output has 
been placed in P, or P, so that an unused output can be selected when 
it is necessary. The example in Section 4.2.1 will clarify this. 

The memory is next scanned for the dual of 7(0); this output and 
its address 2, define ea) for C, . For the input-output pairs in C, 


N/2xN/2 
SUBNET WORK 





STATES: 


1 1 
STRAIGHT THROUGH 
2 2 
1 1 
CROSSOVER 
2 2 


Fig. 5—Base-2 (N X N) rearrangeable switching network. 
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it is only necessary to set m, and m, of the memory word to ‘1’ to 
signify that this word (output) has been used and that it is for P, . 
The output 7(é,) at the address £, defines (ee and is designated 
for C,. The same looping procedure is continued until all z(z;), 
1 s7 SN are assigned to either C, or C, . In most cases, however, 
the looping will end before all output integers are used. The procedure 
is started again by arbitrarily selecting an unassigned output for P, , 
by examining m, . Because of this arbitrariness, C, is, in general, not 
unique. This fact is used to advantage in the other procedure to be 
described in Section 4.3. | 

After the looping procedure is completed, the memory is reorganized 
to have P,; and P, in locations 0 to N/2 — 1 and N/2 to N — 1, respec- 
tively. (A small scratch. pad memory may be necessary.) Then the same 
procedure is applied to each of these (V/2 & N/2) subnetworks, and 
it is continued until all of the 6-elements in the (NV X WN) network 
are set. 

The searching can be eliminated by employing two memories, one 
of which is the I/O memory. The additional one is an output/input 
(O/I) memory that stores the input x; at the address corresponding 
to the numerical value of the output z(x;). The decomposition of P 
into P, and P, is achieved by crisscrossing between the two memories. 
For example, output 7(«;) and its corresponding address zx; in the 
I/O memory define (246...) for C, . Now the dual of r(z,), say 7r(z;), 
is the address for the O/I memory and the corresponding word 7; 
defines (ee a for C,. Then 4; is used for the address in the I/O memory. 
If the I/O memory is a content addressable memory (CAM),’” the 
required w(z;)’s for C, and C, are determined directly in the content 
addressable mode, without the use of a second memory. 


4.2.1 An Example of the Looping Procedure 

In order to illustrate in a meaningful way the looping procedure 
using the coding scheme at various levels, a permutation for a (32 x 
32) network, as given in Table III, will be utilized. 

The looping procedure begins with +(0) = 14 and continues to 
select input-output pairs for the connection set C,. (The sequence of | 


this selection is indicated by the number in the “looping sequence” ~~ 


column.) As each output integer is selected for C; , the last. bit mo is 
used to set the output B-element designated by (msamgmegm1) . (see 
Fig. 6). Then the bit m, is set to ‘0’ to indicate that it is for C,, and 
the bit ms is set to ‘1’ to indicate that it has been used. The bits my) and 
ms of the output integers for C, are merely set to ‘1’. In this particular 
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TABLE IIJ—MeEmory Contents ON THE INTERCONNECTIONS 


Input Outputs Coded Output Integers Looping 
Terminals Terminals Ms, Ms M3 M2. M = My Sequence 
0 14 0 0 1 1 1 0 1 
1 23 0 1 0 1 1 1 
2 3 0 0 0 0 1 1 7 
3 20 0 1 0 1 0 0 
4 28 0 1 1 1 0 0 8 
5 2 0 0 0 0 1 0 
6 16 0 1 0 0 0 0 
7 11 0 0 1 0 1 1 
8 31 0 1 1 1 1 1 9 
9 29 0 1 1 1 0 1 
10 1 0 0 0 0 0 1 
11 27 0 1 1 0 1 1 5 
12 12 0 0 1 1 0 0 3 
13 18 0 1 0 0 1 0 
14 5 0 0 0 1 0 1 
15 24 0 1 1 0 0 0 11 
16 26 0 1 1 0 1 0 
17 21 0 1 0 1 0 1 6 
18 0 0 0 0 0 0 0 4 
19 13 0 0 1 1 0 1 
20 8 0 0 1 0 0 0 
21 6 0 0 0 1 1 0 13 
22 19 0 1 0 0 1 1 2 
23 15 0 0 1 ik 1 1 
24 4 0 0 0 1 0 0 10 
25 30 0 1 1 1 1 0 
26 7 0 0 0 1 1 1 
27 22 0 1 0 1 1 0 14 
28 17 0 1 0 0 0 1 
29 10 0 0 1 0 1 0 
30 25 0 ik 1 0 0 1 
31 9 0 0 1 0 0 1 12 


example, the looping procedure ends prematurely and leaves the con- 
nection set 


¥ 7 28 a 


16 11 17 10 


as indicated on Table IV. 

One then repeats the looping procedure by starting arbitrarily at 
some remaining output integers (as indicated by ms; = 0). After every 
output integer has been assigned to C, (or Cz), one rearranges the 
memory such that all the integers with mo = O occupy the upper half 
of the memory, corresponding to the connections through the upper 
network, with the remaining output integers for the lower network. This 
is shown in Table V. 
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Fig. 6—A (32 x 32) rearrangeable network with partial setting of 6-elements. 


Table VI shows the memory contents after another looping pro- 
cedure is applied to the output integers (for both the upper and lower 
networks) and subsequent rearranging. At this level, m mp denotes the 
(8 < 8) subnetwork, and if one reverses the order to MoM, , it is just 
the binary representation of natural ascending numbers, 0 being the 
upmost (8 X 8) subnetwork and 3 being the lowest (8 x 8) subnet- 
work. One could also obtain the same information from the addresses, 
since the contents are rearranged into this order. 


4.3 Decomposition with a Trial-Partition Procedure 
A second method for decomposing P which incorporates a trial-and- 
error procedure is presented. Although this method is practical only 
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for small networks, it is very important because of the simple imple- 
mentation, and intelligence can be included to reduce the average 
processing time by taking advantage of the fact that the decomposi- 
tion may not be unique. 


From the derivation of the reducible connection sets C, and C, , 
it is seen that they must contain one and only one z(z;) from each 
S,,0 $1 s (N/2) — 1, and the corresponding input z; . Since (2 .:) and 
(,1,,] are defined to be in C, and C, , respectively, only (N/2 — 1) 
additional input-output pairs must be selected for C, ; the remaining 
pairs are for C, . After C, and C, are determined, P,; and P, and the 
rearranging of the I/O memory are derived in the same manner as 
in the looping procedure. Let Y be the set of connection sets; each 
includes (N/2 — 1) input-output pairs formed by having one z(z,) from 


TABLE [V—Memory ContTENts AFTER SEQUENCING THROUGH ONE Loop 
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TABLE V—MeEmory Conrents AFTER REARRANGING 


m Ms m3 Me m mo 
0 0 1 1 1 0 
0 0 0 0 1 0 
0 1 1 1 0 0 
0 1 0 0 0 0 
0 1 J 1 1 0 
0 1 1 0 1 0 
0 0 1 1 0 0 upper 
0 1 1 0 0 0 subnetwork 
0 1 0 1 0 0 
0 0 0 0 0 0 
0 0 0 1 1 0 
0 1 0 0 1 0 
0 0 0 1 0 0 
0 1 0 1 1 0 
0 0 1 0 1 0 
0 0 1 0 0 0 
lower 
subnetwork 


Se OR OCORFOCOFOFOCOFF 
BOOHRREHEHOCOHRHOOCO 
CORB ROFROCRF COFCO FE 
DORR RE OOHOROOHHOH 
Pee ee et et et ft et et et et 


0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 





each S, , 1 > 0. Since there are two elements in each S, , ¥ consists of 
2‘%/)-* connection sets. For any arbitrarily selected C € ¥, the test 
will be only on the output integers. Therefore, C' defines C, if and only 
if for any 7(x,;) € C, its dual is not in C. 

These ideas lead to the application of a finite state machine with 
2°")! states which are used to generate ¥. This machine, called the 
Trial-Partition Machine (TPM), is composed of (V/2 — 1) two-state 
storage devices (flip-flops), each of which represent an input integer 
pair. The “0” or “1” state of the flip-flop designates that the odd 
or even input, respectively, of the input integer pair and the corre- 
sponding output is in C. If o; is a state of the TPM, then it defines 
C, and the w(z,;) € C are tested for an integer pair (two outputs with 
the same characteristic). If C contains no pairs, then it is reducible 
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and can be used as C, . However, if there is at least one output integer 
pair in C, the TPM is advanced to o;,; . Since the outputs are serially 
stored in the I/O memory, the memory must be sequenced in order to 
perform the test for each o; . 

This type of a TPM can be easily implemented with a (V/2 — 1) 
bit binary counter. However, it may be desired to have a more intelligent 
machine so that C’, can be determined in less time (with fewer trials). 
Of course, the complexity and, consequently, the cost of the TPM 
will increase as the intelligence of the machine grows. One way to 


TaBLE VI—Memory ContTENTS AT THE THIRD LEVEL (AFTER TWO 
LOOPING PROCEDURES) 














a i ao | 
m5 Ma ™m3 Me Mm mo 
0 0 1 1 0 0 
0 1 0 0 0 0 
0 1 1 1 0 0 
0 1 1 0 0 0 
0 0 0 0 0 0 
0 0 0 1 0 0 
0 1 0 1 0 0 
0 0 1 0 0 0 
0 0 0 0 1 0 
0 1 1 1 1 0 
0 1 1 0 1 0 
0 0 1 1 1 0 
0 1 0 1 1 0 
0 1 0 0 1 0 
0 0 0 1 1 0 
0 0 1 0 1 0 
0 1 0 1 0 1 
0 0 0 0 0 1 
0 1 1 1 0 1 
0 1 0 0 0 1 
0 0 1 1 0 1 
0 0 1 0 0 1 
0 0 0 1 0 1 
0 1 1 0 0 1 
0 J 0 1 1 1 
0 0 1 0 1 1 
0 0 0 0 1 1 
0 0 0 1 1 1 
0 1 1 0 1 1 
0 0 1 1 1 1 
0 1 1 1 1 1 
| 0 1 0 0 1 1 
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enhance the intelligence is to count the number of output integer 
pairs in C. This number (Z) specifies how many (ee must be changed 
in order to have all r(x;) from different output integer pairs. Then the 
next set (C’) is selected from those sets which contain at least Z different 
pairs and have not been previously tested. Another method is to observe 
the input-output pairs ba) and (ie that form an output integer 
pair, and select a C’ that. contains either 4; or ¢; . For an example of 
a TPM, see Ref. 11. 

The TPM is applicable only for small networks because the number 
of tests becomes prohibitive as N increases. There are 2'/?)-1 con- 
nection sets in © and, on the average, half of these must be tried. A 
TPM with intelligence, however, will reduce the number of sets to 
be tested, but for networks larger than (64 xX 64) this will still be too 
time consuming. 


4.4 Using Combinational Logic 


The control algorithm for the (4 x 4) network can be imple- 
mented with combinational logic. This is achieved by assigning the 
states of the five B-elements for each of the 24 possible input-output 
permutations. The combinational logic, which consists of 13 NAND 
gates, determines the correct setting from the outputs which are 
coded in the binary code. This method is practical only for very small 
networks because the number of permutations grows very rapidly. The 
(4 x 4) network is important, however, because it could be used as 
a building block to construct larger networks, and the combinational 
logic and the B-elements could be on the same semiconductor chip. 
This same idea also applies to an (8 X 8) or (16 X 16) network using 
the TPM. 


4.5 System Description 


The block diagram of a rearrangeable switching system with only 
one memory (I/O Memory) is shown in Fig. 7. The Network Control 
realizes the control algorithm by employing one or more of the above 
methods. For example, it may be advantageous to use the combina- 
tional logic and TPM for the small networks (or subnetworks) because 
they are relatively inexpensive to implement. However, as the size of 
the network grows, the processing time becomes critical. Then the 
looping procedure with one and two memories (or a content address- 
able memory) may be used for less than 1000 terminals and more than 
1000 terminals, respectively. The timing considerations for these 
methods are discussed in the next section. 
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I/O MEMORY 


NETWORK CONTROL 7 
NETWORK BUFFER 


(N x N) SWITCHING 
NETWORK 





SCRATCH PAD 
MEMORY 












SYSTEM CONTROL 






Fig. 7—System block diagram. 


The Scratch Pad Memory temporarily stores the outputs during 
decomposition of P and also while the I/O Memory is being par- 
titioned. The System Control generates the timing and control 
sequences for all of the operations. This unit, as well as the Network 
Control, could be implemented with stored program techniques if it 
is economical and if there is sufficient real time. 

The other important unit, called the Network Buffer, isolates the 
network from the Network Control so that the existing traffic will not 
be affected during the processing time of the control algorithm. As the 
settings of the B-elements are determined, they are stored in the 
Network Buffer. After the control algorithm is completed, the states 
of the £-elements are set within the time required to insure the quality 
of the transmission. This time is dictated by the switching transients 
of the B-elements, network terminations, and the application. Serial 
shift-registers can provide economical buffers, if the B-elements can 
be set in a stage-by-stage sequence. 


Vv. TIMING CONSIDERATIONS 


In this section, the processing time and the necessary equipment for 
the various methods of implementing the control algorithm are dis- 
cussed. The processing time for the algorithm must be sufficiently short 
to accommodate the traffic changes. The combinational logic method 
is the fastest; however, it is applicable only for very small networks. 
The TPM has the next order of complexity, and it is applicable for 
(64 X 64) networks or smaller. The processing time, which increases 
exponentially, is derived as follows: For an (N X N) network, there 
are 2(%/2)-1 states of the TPM. If the TPM is a binary counter (no 
intelligence), on the average about half of the states or 2(¥/2)-? must 
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be tested before the given P is decomposed to C; and C.. Also, on the 
average, one-half of the I/O memory is scanned before an output 
integer pair is detected. Therefore, the number (Ar) of times that the 
I/O memory is accessed for log.,N levels [the log,N‘ level is the 
trivial (2 xX 2) network, and the number of times the memory is ac- 
cessed is simply N/2] is: 


00 | 


logaN~1 N 
Aa Do ee aes (1) 

i=1 

The processing time can be greatly reduced by using the looping 
procedure. If a random-access I/O memory is employed, it is necessary 
to search for one-half of the outputs (the remaining outputs are 
obtained ‘directly from memory) on each level of the network. For an 
(N x N) network, N/2 outputs are determined by accessing the I/O 
memory, on the average, NV /2 times. Then 


log2N 1 i+1 
Ar =N > (4) = iN(N — 1). (2) 
In addition, the access time required to partition the memory is 2N 
logsN and should be included in equations (1) and (2) to give the 
total processing time. 

If two memories (in the crisscross manner) or a CAM is utilized, 
no searching is necessary, and access times for the decomposition of P 
and partitioning of memories are: 


Ap = 4N log. N 
and (3) 


Ar = 8N log, N_ respectively. 


For N = 16,384 and a CAM with 1-psec access time, the control 
algorithm, implemented with wired logic and a 10-MHz basic clock, 
can be accomplished in approximately 750 msec. If two memories 
(random access) with I-ysec cycle time are used, a processor with 
an instruction execution time of 3 psec can implement the control 
algorithm in approximately 50 seconds. 


VI. CONCLUSION 


In this paper, the network structure and control algorithm for cer- 
tain (VN X N) rearrangeable switching networks are described. The 
algorithm consists of decomposing a given permutation into d (where 
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d is the base of the network) permutations for the d (N/d x N/d) 
subnetworks, and determining the connections for the N/d (d x d) 
networks in the input and output stages. The same procedure is 
applied to the subnetworks in an iterative manner until all of the 
connections in the (N xX N) network are defined. 

Although the network can be constructed with various building 
blocks (bases), the base-2 structure is the most important because it 
requires the least number of two-state devices (B-elements) and the 
control algorithm is relatively simple. The algorithm is implemented 
by performing the decomposition either by the looping procedure or 
by a Trial-Partition Machine. Also, an efficient coding scheme is de- 
fined to facilitate the decomposition of the permutation and the par- 
titioning of the memory. With the base-2 structure, the control 
algorithm and the coding scheme can be used in a consistent manner 
at each level of the network. There are other classes of network struc- 
tures with the same number of B-elements, such as the nested-tree net- 
works,* that do not have this property. 

The processing time and equipment complexity vary with the 
methods of implementation. The combinational logic is the fastest and 
least expensive; however, it is only applicable for very small networks. 
The Trial-Partition Machine is economical, but it is too slow for large 
networks; however, intelligence can be designed into the machine 
taking advantage of the fact that for a large number of permutations 
there are more ways than one of setting the network. Consequently, 
the processing time is dependent on the permutations given, as well as 
the amount of intelligence built in. The most suitable method for 
networks larger than (64 X 64) is the looping procedure with a con- 
tent addressable memory to store the outputs. The processing time is 
independent of the permutations given. With this method, it is pos- 
sible to determine the setting of all the B-elements for a (16,884 x 
16,384) network in less than one second for any number of new con- 
nections or terminations. During the processing time, the new settings 
for B-elements are stored in a buffer; then their states are changed. 
The memory required for this system is about 300k bits or approxi- 
mately 20 bits per input terminal. 

This paper has described a control algorithm for a rearrangeable 
switching network that is practical from both the system and process- 
ing time viewpoints. The application of this network should be con- 
sidered where full access and nonblocking is required, and rerouting 
is possible. 
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On a Class of Rearrangeable Switching 
Networks 


Part Il: Enumeration Studies and 
Fault Diagnosis 


By D. C. OPFERMAN and N. T. TSAO-WU 
(Manuscript received December 1, 1970) 


The decomposition of permutations as used in the control algorithm 
for a class of rearrangeable switching networks is proved. Enumera- 
tion studies on permutations related to the network are presented. 
Theorems for constructing a set of traffic patterns for diagnostic pur- 
poses are also given. Finally, a procedure for detecting and locating 
faulty switching elements in the network is described. 


I. INTRODUCTION 


This part of the paper will cover some of the theoretical considera- 
tions related to the rearrangeable switching networks discussed in 
Part I. For the general (NV x N) network with base-d structure, it is 
shown that it can indeed accommodate any of the N! connection 
patterns. A thorough study is then made of the (N x N) network 
having a base-2 structure. It was pointed out in Part I that the 
setting of the B-clement is, in general, not unique for an arbitrary 
input-output permutation. Furthermore, the number of B-elements for 
an (N xX N) network exceeds (logs(N!)), for N > 4. Some enumera- 
tion studies are given to account for this. Finally, fault diagnostic 
studies are given in relation to the base-2 network. A method to con- 
struct a set of permutations useful for testing the network is developed. 
This is then followed by discussing a procedure to detect and/or locate 
faulty B-elements in the network. 


Il. PERMUTATION PROPERTY OF THE NETWORK 


In this section it will be shown that the decomposition of the given 
permutation into reducible connection sets (as used in the control 
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algorithm) is always possible. From Section 3.2.1 of Part I, it is evi- 
dent that the decomposition is equivalent to the selection of d sets of 
output integers, +(z;), one from each S,, such that all the output 
integers in any of the d sets have distinct characteristics, where S,, 
as previously defined, is 


S, = {r@)|eEJLd} 1511S N/d; 


and there are d elements in each S, . 

To show that this selection and, therefore, the decomposition can 
always be done, P. Hall’s Theorem* on Distinct Representatives is 
used and is stated as follows: 


P. Hall’s Theorem: Let L be a finite set of indices L = {1, 2, +--+ n}. 
For each | € L, let T;, be a subset of a set T. A necessary and sufficient 


condition for the existence of distinct representatives t; ,1 = 1,2, +--+ ,n, 
t, EG T,,t; At; whent $¥ j, ts that for every k = 1, 2, --- , n and every 
choice of k distinct indices 1, , lz, +--+ , , , the subsets T,, , Ti, , °°, Tu 


contain between them at least k distinct elements. 
This theorem can be used directly if a mapping ¢ is defined on the 
sets S, as follows: 


8S, = {r(a)} > T, = {2} 1<l1SN/d, 


oe Ez +a - Ll" 


This simply means that each integer in S; is replaced by its charac- 
teristic ¢, with 7, having exactly the same number of elements as S,. 
Thus, the selection of d sets of +(x;), one from each S,, such that the 
integers in each of the d sets have distinct characteristics, is equiva- 
lent to the selection of d sets of N/d distinct representatives, one from 
each JT;, such that in each of the d sets t; ~ t; for 7 A j. 


By Hall’s theorem, it is sufficient to show that for every k = 1,2, --- , 
N/d and choice of k distinct indices 1, ,l.,--- ,l,,thesetsT,,,7T),,°-° , 
T,, contain between them at least k distinct elements. But this is 
clearly the case here, since each set S, , and, therefore, each set T, , 
contains exactly (d — j) elements after 7 sets have been so selected. 
0 <7 d -— 1. Thus, there are k(d — 7) elements in the sets 7), , 
T,,, °°: , Ty, , of which at most (d — 7) elements are identical (derived 
from the fact that there are at most (d — 7) output integers belonging 


where 


* [z] is the integral value of z. 
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to the same integer set after 7 sets have been so selected). Therefore, 
there are at least k distinct elements. The index 7 is introduced to show 
that the selection of d sets of r(2;) can be made on a sequential basis. 


Ill. SOME RESULTS ON ENUMERATIONS 


For the remaining sections, the discussion will be restricted to the 
(N x N) network with base-2 structure. Some definitions (in addition 
to those in Part I) relevant to the enumeration study as well as the 
network diagnosis are given first. 


3.1 Definitions 


(2) For any given connection set C, C C P, having input-output 
pairs (the outputs are denoted as y; instead of 7r(x,;) to simplify the 
notations), 


ik Yq °"° = l<m<N, 

Yr Yo *°* Ym 

there exists an inverse of C, denoted by C™’, which is a connection set 
Ya Ya cee i 
XX, Xe eee Ce. 





(iz) For any two connection sets C;, and C; that have the same set 
of input (z;) and output (y;) integers, the product C';C;* and its cycle 
can be defined in the standard manner, similar to that usually as- 
sociated with permutations.” 

(i2t) A loop is a connection set where, for any 


‘ E L, 7 *) and ue 
Yi Yi Ye = Yi 
The number of these input-output pairs in L is called the order of L, 
which is necessarily even. Moreover, all loops are distinct, i.e., any 
two loops do not have any common input-output pair. 

(iv) A proper loop is a loop in which the input-output pairs are 
arranged so that both x and ¢ and y and @ are adjacent in a circular 
sense, e.g., 


EL. 
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This can always be done, and any loop is considered to be a proper 
loop, unless otherwise specified. Any permutation P on N integers 
can be written as 


P= (Ly, Le, +++ » Dm) 


and is said to have m loops, 1 S m S N/2. 
(v) A loop LZ, of order 2k, is said to be decomposed into two inde- 
pendent connection sets C, and C,,C, , C2 C BL, if for any pair (z*] Ec, 


| and | Te Jea. 
Yi Yr = Yi 





(vt) The derived sets Q, and Q, (obtained from independent con- 
nection sets C, and C, respectively by replacing every integer by its 
characteristic) are denoted by (Q, , Q.). If C, and C. are reducible, 
Q, and Q, are permutations P, and P, respectively, and they are referred 
to as derived permutations. 


3.2 Enumeration of Permutations by Loops | 

In terms of the definitions just given, the looping procedure for the 
control of the (NV x N) network, described in Section 4.2 of Part IJ, is 
equivalent to arranging the given permutation having m loops into 
the form 


P= (3h) lem 2N/z, 


and decomposing it to two reducible connection sets C; and C. by 
grouping the alternate input-output pairs from each loop into C, and 
the remaining into C2 . Since the decomposition is not unique if m > 1, 
it is readily seen that for any permutation with m loops, there are 
2” possible ways of decomposition. This leads naturally to the ques- 
tion of how many of the V! permutations have m loops, 1 < m < N/2. 


The following lemmas and theorems will establish a natural relation 
between cycles and loops. The enumeration of permutations with m 
loops (1 S$ m S N/2) can be expressed in terms of that of cycles, 
which have been well studied.’ 


Lemma 1: The derived sets (Q; , Q2) of a loop L of order 2k have the 
same set of integers x; and y; , and the product Q,Q3* has one cycle of 
length k. 


Proof: Indeed, by definition (v), for every input (or output) integer 
in C, , its dual is in C, ; and since they have the same characteristic, 
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Q, and Q. have the same set of input (or output) integers. Further- 
more, since L is a loop, C, and C2 are of the following form 








, / eee al ah eevee “At 
C= ie MiP) C, = L, te “ 
Ge. Uy. SS Dies We Ya se OU 
where [(v/ + 1)/2] = [(€/ + 1)/2] = x; . Thus, Q, and Q, are of the 
form 
Q, = 1 2 k | ; Q. — | 1 2 ( 
Ye Yrorre Yee Yr Yo *c* Vk 
and, clearly, 
Q -1 _ * Be ct LE 
12 Fi 
Le Uy 28% hy 





and has one cycle of length k. 
Q.E.D. 


Corollary 1.1: There are 22* loops that give identical derived sets 
(Qi, Qe). 


This is clear from the fact that there are 2k integers (input and out- 
put) in Q,, and for each (Q; , Q2), there are 2? possible pairs Cy and 
C,. For any given pair C, and C2, the pair Cz and C, reduces to the 
same (Q,, Qs); therefore, 2?" is divided by two. 

From definition (iv), any P with m loops can be written as 


P= ing 89 ey De); 


where L, , L2,-°-- , L, are disjoint loops. Applying Lemma 1 repeatedly 
on L; , the following important theorem that establishes the relation 
between loops and cycles is obtained. 


Theorem 2: The product P,P;*, where P, and P, are obtained by grouping 
one Q from each L, has m cycles tf and only 1f P has m loops 
(4 sm S N/2). As defined in Section 3.1, P, and P, thus obtained 
are the derived permutations. 


Corollary 2.1: There are 2%-” permutations, having m loops, that 
will give the same derwed permutations (P,, Pe). 


This is proved by repeatedly using Corollary 1.1, and it leads to 
another enumeration on the number of permutations P that have m 
loops. 
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Lemma 8: Define R to be the set of all the derived permutations (P, , P2) 
which have the same product P,P;'. Then there are (N/2)! (P: , P2) in R. 
This is true because both P; and P, are permutations on N/2 integers. 


Let C(n, m) denote the number of permutations on n integers that 
have m cycles. Then there are C(N/2, m) distinct products P,P ’ 
that have m cycles. As a direct consequence of Corollary 2.1 and Lemma 
3, the following theorem is established. 

Theorem 4. There are exactly 2X-"(N/2)! C(N/2, m) permutations 
P which have m loops. 


Thus, the enumeration of permutations by loops is related, in a 
simple manner, to the enumeration of permutations by cycles. The 
latter problem has been well studied,®> and the enumeration is gen- 
erally expressed in terms of the Stirling numbers‘ of the first kind, 
s(n, m), as follows: 

C(n, m) = (—1)"*"s(n, m), 


where s(n, m) can be evaluated from the following generating 
function 


s(n, mi" = ¢— 1) Gn $0) 


and ( —1)"*"s(n, m) is always positive. For the interesting case m = 1, 
the number of permutations with one loop is 


(N/2)\(N/Q — 1)19°7?, 


3.3 An Example 


This enumeration is illustrated with the case N = 8. If the number 
of permutations P that have m loops is denoted by D(N, m), Table I 


TasBLE I—TuE NuMBER oF PERMutTATIONS D(N, m) THat Have m 


Loors 
m c(4, m) D(8, m) 
1 6 27.41-6 = 18,432 
2 11 26 41-11 = 16,896 
3 6 25.4!1-6 = 4,608 
4 1 24.44-1 = 384 


Total 8! = 40,320 
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accounts for all the permutations. Wherever m > 1, there are more 
than one setting of the B-elements in the input and output stages that 
will satisfy the same permutation. This applies to all the stages as 
each subnetwork is taken into consideration, and, thus, the total num- 
ber of states provided by all the B-elements exceeds (loge (N/)). 


IV. CONSTRUCTION OF TEST PERMUTATIONS 


It has been pointed out above that for any P having m loops, cer- 
tain m B-elements at the input and output stages can be arbitrarily 
set. To detect faulty B-elements, one must find a class of input-output 
permutations, or test permutations which are realized by a unique 
setting of the B-elements. The property of such a permutation is that 
it and all its derived permutations, at every level of the network, 
have exactly one loop. To show that they do exist and can be gen- 
erated, one proceeds as follows: 


Lemma &: If a loop L is given, then any loop L’ formed from L by 
taking the dual of one or more of the integer pairs (input or output) 
in L will give the same derived sets (Q:1, Q2), (Q2, Qi) being con- 
sidered the same as (Qi, Qe) for the remaining discussion. 


This is obvious from the fact that the characteristic of an integer is 
not changed by taking its dual. 


Theorem 6: Let L and L’ be two loops having the same (Q:1, Qz). 
Then the product L(L’)-* has one cycle tf and only rf L’ is obtained 
from L by replacing every integer except one integer pair (input or 
output) mn L by its dual. 


Proof: That Land L’ do have the same (Q, , Q2) is a direct consequence 
of Lemma 5. Moreover, the loops LZ and L’ have the same set of x, 
and y; , since v7; , 4; , y: , #; are all in L. Thus the product L(L’)™ is 
defined. Now, let L of order 2k be written as follows: 








L = Ly Le C5 Wey Vk ie Vy 
A A A 
YW Yr Yo *t* Yr-r Yeo Ye 
and 

Da A A 

¥ 6 x 4 seis 0 4 0 4 03 

| os 1 2 2 k k 1 

ies | | 
YrsYUr Yo °°* Yr-er Ye Ye 





where, without loss of generality, the only unchanged pair is y, and %, , 
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since the ordering of subscripts is immaterial. The transformation from 
L to L’ can be expressed in terms of the input-output pairs, namely, 
(2) is replaced by (3!) and (,7*, ) by (,**, ) except (%) is replaced by (=) 
and (=) by (2). The input-output pair for L(L’)~* will be, in general, 
(Zo) forl S71 < k and (ae) for 1 < 7 S k with the exception of (2) 
and a), Therefore, 


L(L/)* = (Arbo +++ Byte +++ 21), 


where the product written in the familiar cycle form has one cycle of 
length 2k. 

To show the converse, it is sufficient to show that the loop L’, ob- 
tained by either taking the dual of every integer in L or taking the 
dual of every integer except two or more integer pairs, will not satisfy 
the second property. Referring to the above, it is seen that if every 
integer is replaced by its dual, then the product 


LL’) = (£125 en Lx) (Lie -1 7+ + 21) 


has two cycles, each of length k. 
If there are more than two integer-pairs unchanged, one can always 
write 


[i= T, by Zq 99° By Bry oe Hy t, #, 
_ ’ 
QU Goo ct Yi OY; a) an ne 


where the first other unchanged integer pair is y; and 9; ,j < k. Then, 
by the same argument given above, the product L(L’)™ has at least 
one cycle of length 27, namely, 


(4,42 etd £50 ;0;-1 cor £1) 2j < 2k. Q.E.D. 


Corollary 6:1: If Lisa loop of orderk,k = 4, there are k such loops 
L’/ where L and L’ give identical derived sets (Q1, Qe) and L(L’) 
has one cycle. 

This is obvious since there are k/2 input integer pairs and k/2 out- 
put integer pairs. The case k = 2 is a degenerate one, since taking the 
dual of the input pair only yields the same loop as the one obtained by 
taking the dual of the output pair only. Hence, only one L’ is possible. 

By repeatedly using the above theorem, one can show the following. 


Theorem?: If P = (Ly, L2,°:+: , Lm) and its derived permutations 
are (P,, Pz), then another permutation P’ will have the same derived 
permutations and P(P’)- will have m cycles af and only if P’ is ob- 
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tained by taking the dual of every integer except one integer pair 
(input or output) in each L,, 1 SiS m,inP. 


Corollary 7.1: There are kik, +++ ky, ways of deriving P’ such that 
P and P’ give identical derived permutations (P;, Ps) and P(P’)“ 
has m cycles, where k, vs the order of L; and k; = 4. If k; = 2 for some 
D,, 1t will be taken as unity. 


The following example illustrates what has been discussed. P has 
two loops, and the derived permutations (P, , P.) have the property 
that P,P;* has two cycles. 


ee 6 7 a 
85197 11 3 12 2 10 4 6 


=e ee erie ale eae 
8 7 11 123 4 65 12 10 9 





a loop a, loop 


The decomposition of P yields: 


ro (! 34 6 2 ‘), ra (? 4615 ‘); 
462 3 1 5 462 3 1 5 

and P,P,* = (1 3 4 6) (2 5) has two cycles. P’, which gives the same 
pair (P, , P2), is obtained from P, and one of the 32 possibilities is 


Pe ee 9 
7812114 3 651 29 10 


(The underlined integers are the unchanged ones in each loop.) 
Furthermore, P(P’)+ = (1 67 12 11 8 5 2) (3 49 10). 

The permutations for which m = 1 can be used in the generation 
of the test permutations. This is achieved, for any N, by starting with 
any permutation on 4 integers that has one loop and applying Theorem 
7 repeatedly in an iterative manner. One can show that there are 


9 (2N—-3+ (loge N (logs N—-3))/2) 


such test permutations by repeatedly using Corollary 2.1 and Corollary 
7.1, with m = 1. The construction of one of these, based on Theorem 7, 
is illustrated as follows. 

In order to clarify the following discussion, test permutations on N 
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integers and their derived permutations are denoted as T(N) and 
(T;(N/2), T2(N/2)) respectively. If it is desired to construct a 7 (16), 
then the first step is to select a 7,(4) and generate 7'2(4) such that 
T, (4) (T2(4) )- has one cycle. There are 16 7'(4) that have one loop; 


one of these is 
ny = (' 2 3 2 
142 8 


T.(4) is obtained by taking the dual of every element except one 
integer pair (by Theorem 7). One of the four choices is 


ny) = (' 2 3 = 
ae ae 


Any permutation that decomposes into 7';(4) and 7'2(4) can be used 
for T,(8); one of the 128 possible permutations (by Corollary 2.1) is 


123 45 67 8 
T,(8) = ( ’ 
13 8 5 4 7 6 2 


where the connection set corresponding to 7, (4) is taken as 
; 857 
1 8 4 6 


There are eight choices for 7.(8), and one of these is 


123 4 5 67 8 
T.(8) = ( ) 
2467 8 8 1 5 


Similarly, one of the possible 7'(16)’s is 


. 





mag = (OS 5 6 6 8 9 10 11 oe) 


145 8 12 1613 10 15 7 614 211 9 38 


It will now be shown that a permutation that is realized by com- 
plementing every setting of the B-elements that realizes T’'(N) except 
the one corresponding to inputs 1 and 2 (see network structure) is 
also a test permutation, 7°(N), and that it can be generated parallel 
to T(N). These two permutations are used for fault detection in a 
manner to be described later. 


Theorem 8: Let T,(N) be atest permutation that has (T,(N/2), T2(N/2)) 
as the derived permutations. And tf there exist two other test permutations 
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T°(N/2) and TS(N/2) such that the product T<(N/2)(TS(N/2))~* has 
one cycle, then one can construct a test permutation which will have 
(Ti (N/2), T3(N/2)) as the derived permutations. 


Proof: Let T,(N) be decomposed into two reducible connection sets, 
C, and C, , where C, can be written as 


v =] Xo ons 6 XN/e2 
C, 1 / 


Yn Yo *** Ynee 


(x; = 1 is arbitrarily defined by the network structure). 

The connection set C{ is formed by replacing each input and output 
integer z © T<(N/2), except x, = 1, by x (or y) that has the characteristic 
z and has its dual ¢ = 2, , for some 2, belonging to C, , 1 <7 < N/2 
(org = y; E C,,1 <7 < N/2). Clearly, this can always be done because 
every z © Ti(N/2) has two integers with z as their characteristic, and 
only one of them is in C, . Similarly, a connection set C3 can be formed 
from T5(N/2), based on C, . The permutation, obtained simply by 
combining C{ and C3 , has the derived permutations (Tj (NV /2), T3(N/2)), 
and it has only one loop. Furthermore, it results in the complementary 
setting of B-elements by the looping algorithm, since any integer in C$ 
is the dual of some integer in C, . Q.E.D. 


Theorem 9: One can construct a test permutation TS(N) from T<(N) 
in the same way as T.(N) ts obtained from T,(N) as given in Theorem 7, 
and the product has also one cycle. 


Proof: Let T.(N) be obtained from 7,(N) by taking the dual of 
every integer except one pair, say, (x; , £;). The permutation, obtained 
from Ti(N) by taking the dual of every integer pair except the same 
pair (x; , £;), is indeed a test permutation by Theorem 7. Using the 
same argument as in Theorem 8, it is easily seen that the setting of 
B-element to realize this permutation is complementary to that for 
T.(N). Hence it is T3(N). Q.E.D. 

Since the permutations 7,(2) and T.(2), and the corresponding 
T{(2) and 7T3(2), can always be constructed, one can, by induction 
on N, construct the two test permutations T(N) and T’(N) for arbitrary 
value of N. One can, in fact, generalize Theorems 8 and 9 to establish 
arbitrary relations between two permutations in the 6-element settings 
in addition to T(N) and T°(N) for which the settings are complementary. 

The construction procedure for T°(N) is illustrated by determining 
the T°(16) as related to 7(16) given in the previous example. In that 
example, C; = (14) and C, = (23). Also T,(2) = (2) and T,(2) = (2); 
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therefore, 7{(2) = (7) and T3(2) = ({ 3). All elements except input 1 
in Cf must have their dual in C, ; also T((2) and T$(2) must be satisfied; 


therefore, 
“=|! ) and =|? ‘ 
4 2 1 38 


e 1 23 4 
T;(4) = ( ) 
412 83 


Keeping the input integer pair (1, 2) unchanged, one has 


¢ 123 4 
T3(4) om ( ) 
os 2 4 1 


Repeating the procedure, one obtains 


and 


‘ 123 4 5 67 8 
T;(8) = ( ) ’ 

7642 8 8 1 5 

, Lb, a2 4 6 8 
T3(8) = ( : : ¢ ’ 

8 5 13 4 7 6 2, 


and 


rao) = ( | 23 45678 91011 earners 


13 161012814515 7 614 211 9 8 


V. INVERSE CONNECTING EQUATIONS FOR BASE-2 STRUCTURE 


The looping procedure for setting the B-elements to realize a given P 
is described in Part I. The inverse problem of defining P from the 
states of the B-elements is also of some interest. If P can be derived 
from the 8-element setting, then it is not necessary to store the con- 
nections in another memory. Also the inverse connecting equations 
are used in the location of faulty B-elements. 

In the control algorithm for the base-2 structure, the states of 
the B-elements are derived in an iterative manner from the outside 
(first) level to the center (log.V)* level. Therefore, to obtain the 
inverse connecting equations, the states of B-elements in the center 
stage are considered first. 

The B-elements in the network are numbered (see Fig. 1) such that 
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INPUT OUTPUT 


NTH STAGE 2ND STAGE {st STAGE 2ND STAGE NTH STAGE 





N-2 - N : N-2 
N-1 N-i 


Fig. 1—The numbering of 8-elements in an (N X N) network. 


the defining equations for each input-output pair appear in a simple 
form. pjx and vj. are the input and output f-elements respectively. 
They are located in the jth stage (counting from the center stage), 
the kth (2) x 2/) network, and the /th position at the input (or out- 
put) stage. The center stage is considered as the first output stage, 
and the B-elements are denoted by vix1 . They are defined as ‘0’ or ‘1’ 
when set to straight-through state or crossover state, respectively. 

The code for the input and output integers is the same as given in 
Section 4.1 of Part I. For any input integer x; (or output integer y;), 
the normal binary representation is its coded form, having a code 
length of n = logeN. And it is expressed as follows: 


UF Lite + Lin. 


The inputs to the center stage are designated by a,a = 1,2,---,N, 
as shown in Fig. 1, and the input-output pairs are ordered according 
to a. For each input-output pair (z<] , the code words for x, = 
Loila2 °° Lan ANd Yo = Yaar *** Yan Can be calculated from the 
following inverse connecting equations: 

Lar = (@ + 1) mod 2, 
p mod 2* 


Lai = Bikts lL<jsn; 


*z1 = zand z° = 2Z, the complement of z, and z = 0 or 1. 
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and 
__ .a@ mod 2 
Yor = YVit(a+i)/2]1 5 (1) 
__ _.pP mod 2 : é 
Yor = Vikly 1l<jsSn; 
where 


-|2-14 |, = [ee] 
(os gi-t ’ <r 9 


and J, and Il, are the integers represented in the coded form by 
Cater °°? Latj-1) NA Yaar *** Yaci-1) Yespectively. The equations 
for tq; (OF Ya;), 1 < j S n, can be obtained in a recursive manner 
from the following Boolean equations: 

If « and j are such that p is even, 


Lay = (EarFas at Ea(i-1)) Min are (GarFa2 oo La(i—1)) Mike 2 ya 


= (aX a2 oss La(i-1))MjR(2i-4) . 
And, if p is odd, 


Lai = (EarEar +++ Eaci-1y)Bin 1 Bu@an?*? Lado) Mia te" 
+ Galan *** Laci—y)Mincei-» - 
The following example of the (8 x 8) network shown in Fig. 2 
illustrates the inverse procedure. For each « = 1, 2, -+: , 8, the coded 
Matt Pant Vitt Van Vai 





01234567 
174035267 ° 


Fig. 2—@-element setting for P = 
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inputs and outputs are calculated. If a = 5, then 


25, = (5 + 1) mod 2 = 0; 

L352 = jean = or = 0; 
and 

U53 = jetta = fiz1 = 1. 
Also 

Ys. = Mitra = Mar = 1; 

Ys2 = SSE = Peo2 = 1; 
and 

Ys3 = Vere rate = a4 = 1. 


Therefore, the input-output pair 
Ys 
Ya 
are determined in the same manner, and they are: 
F a be | o Ha | _ fe 
Yi 001 P 110 Ys 100 
“| a " *| = ia ae 
Ye 010 Y7 000 
Then the input-output permutation is 


2 ee ge 
17403 5 2 6 


VI. DIAGNOSIS OF FAULTY B-ELEMENTS 








ml bh 


1, 2, 3, 4, 6, 7, 8) 


The remaining 


X4 














: ee 
Ya O11 


‘| _ fe 
ys} (101 











The physical design of the B-element is the major factor in deter- 
mining the method of detecting and locating the faulty elements in the 
network. For example, the detection of a faulty -element which 
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either opens or shorts to ground is trivial. If one has access to the 
actual state of each B-element, then the location of faulty elements is 
also trivial. In this paper it is assumed that the individual B-element 
is not accessible, and it is considered to fail when it remains in one of 
the two states. 


6.1 Detection 

By using the test permutations 7(N) and T°(N), each B-element 
is checked for the two possible states. Failure in any number of 
B-elements will be detected by the fact that either 7(N) or T°(N) or 
both will not be realized. It is to be noted that any permutation having 
more than one loop cannot be used because the setting of some two or 
more 8-elements is arbitrary, and, therefore, failure of these elements 
in certain states may not be detected. 


6.2 Location 


With any test permutation 7'(N), failure of one 6-element will result 
in a permutation different from T(NV) by only two input-output pairs, 
that is, the input-output pairs (=!) and (7!) become eae and (') for 
some 7 and j. The inverse connecting algorithm discussed in Section V 
can be used to locate the particular (or the faulty) 6-element common 
to (7!) and (zi) with their associated a’s which are stored in the memory. 

The following example illustrates this procedure. If the test permuta- 
tion for an (8 X 8) network is 


7 =(° 123 45 6 ‘), 

027 43 65 1 

then, using the same coding scheme as in Section V, the setting of the 
£-elements for it is shown in Fig. 3. Also, the a’s corresponding to each 
input-output pair are calculated by using the inversing connecting 
equations, and they are given as follows: a1 , a2, a3, 04,05 ag, a7, and 
and ag correspond to input-output pairs 


Se: E)- Chemo 


Assume that B-element vzi2 1s faulty, and it is fixed in the crossover 
position. Then the actual permutation realized is 


eee 
02543671 


and the incorrect pairs of 7'(8) are (2) and (8) or in coded form (°°) 
and (11°). The a’s for these pairs are 3 and 2 respectively. By using 
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Fig. 3—6-element setting for T(8) = Coe 


the inverse connecting equations (1), the 6-elements through which 
(2) and (¢} are connected are found to be psi2 , Mery y Yie1 » Yor2, ANA V3i4 , 
and pais, Yi11 , Yore » aNd v3i3 respectively. It is seen that B-element 
Yor2 iS common to both input-output pairs; therefore, it is the faulty 
B-element. 

If two B-elements are faulty, there are either three or four pairs in 
T(N) not realized. If four input-output pairs are wrong, the locations 
of faulty elements can be determined in the same manner as described 
above. Three pairs are incorrect when one particular (7‘} is connected 
through both of the faulty elements. For this case, it is necessary to 
change T(N) so that one of the faulty elements is in the proper state, 
and then the other one can be located. This is achieved by having 
2 log, N — 1 test permutations with each one constructed (using a 
generalized form of Theorems 8 and 9) to complement different stages 
of input or output B-elements, one stage at a time. The same procedure 
as above is used to locate the faulty 8-element. 

If the faulty 6-elements are restricted to one stage of the network, 
then this stage can be located in a manner similar to the above. For 
this case, a set of T'(NV), log, N in number, is used to complement the 
B-elements on each stage (input and output) of the network. The a’s 
corresponding to the incorrect (22), will remain the same until the 
faulty elements are complemented. Therefore, the stage containing 
the faulty B-elements is determined. 


6.3 Adaptive ‘Looping’ Algorithm 


In the looping algorithm as given in Part I, the derived permuta- 
tions P,; and Ps are always routed through the upper and lower 
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(N/2 xX N/2) networks respectively. This is because in the most 
efficient network structure (see Fig. 1 of Part I) one @-element in the 
first stage of each subnetwork (i.e., pj, 1 < 7 S logeN ,1 Sk SN/2!/) 
is fixed in the straight-through state. However, by introducing re- 
dundant B-elements pj; , (as in Fig. 1) one can change (or adapt) the 
control algorithm at certain stages to realize a particular permutation 
_if there is one faulty B-element per subnetwork at each stage. 


VII. CONCLUSION 


Important relationships between the loops of input-output permuta- 
tions and cycles of permutations are established. These properties are 
used to enumerate the input-output permutations in terms of loops 
and to construct special test permutations which require unique 
B-element settings. Also, inverse connecting equations which define the 
input-output permutation from the states of the #-elements are 
derived. These ideas are utilized in the diagnosis of faulty B-elements. 

It is clear that network failure due to any number of faulty elements, 
which may be distributed over many stages, can be easily detected 
by using only a pair of test permutations. If these faulty elements 
are limited to only one stage of the network, this stage can be located 
by employing the inverse equations and a set of test permutations. 
Furthermore, if only one or two elements fail, their exact positions in 
the network can be located by employing a similar procedure. 

If the faulty elements are limited to only one in the first stage of 
each subnetwork, then any input-output permutation can be realized 
correctly by adding a redundant B-element in the first stage of each 
subnetwork and adapting the looping algorithm at the appropriate 
subnetworks. 

The fact that this type of rearrangeable switching network has some 
attractive diagnostic properties should enhance the possibility of it 
being used in some practical switching systems. 
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A Full-Duplex Echo Suppressor Using 
Center-Clipping 


By O. M. MRACEK MITCHELL and DAVID A. BERKLEY 
(Manuscript received October 9, 1970) 


For telephone circuits which include synchronous satellites, con- 
ventional echo suppressors of the voice-switching type are less than 
satisfactory because of speech mutilation and the presence of echo 
during double talking.1 We have found that a multiband center-clip- 
ping process may be used as an echo suppressor. This echo suppressor 
is unique in that no double-talking decision has to be made. The near- 
end signal, plus echo of the far-end signal, is divided into several 
contiguous bands with each filter output going to a center clipper. A 
control circuit sets each clipping level equal to or greater than the echo 
level in that band. A preliminary analogue implementation of this echo 
suppressor, in which control circuit gains were manually adjusted. to 
match the experimental return loss, was informally demonstrated using 
a simulated satellite circuit. Although no attempt at quantitative evalu- 
ation has yet been carried out and further evaluation is necessary, no 
echo was reported during this demonstration, even during double talk- 
ing, for return losses approaching 0 dB. Operation appeared to be 
full-duplex at all times with little distortion of the speech. For return 
losses greater than about 15 dB, the center-clipping system was almost 
indistinguishable from a 4-wire connection with no echo path. In 
practice, adaptive setting of control circwt gains as a function of 
return loss would be desirable if this technique is used as a replace- 
ment for conventional echo suppressors. 


I. INTRODUCTION 


During investigations of a multiband center-clipping process for use 
in reverberation reduction? it occurred to us that this process, which 
can remove the effects of long-time reverberation or echoes in a room, 
could also be used to remove echoes in telephone lines resulting from 
imperfect hybrid junctions.*? Independently, J. R. Pierce also sug- 


1619 


1620 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1971 


gested that this process could be applied to echo suppression and pro- 
posed a scheme for controlling the levels of the center clippers in a 
conventional split echo suppressor configuration.* 

One end of a conventional split echo suppressor is shown in Fig. 1. 
It is located in the 4-wire section of line near the hybrid junction to 
the 2-wire loop of the near-end customer. A similar configuration is 
inserted at the other end of the 4-wire trunk. Because of imperfect 
balancing of the hybrid, part of the received signal from the far-end 
talker feeds through the hybrid to the transmit side of the 4-wire line. 
The return loss of the hybrid is typically 15 dB, that is, the echo 
level at the echo suppressor is 15 dB below the normal transmit signal 
level of the near-end talker measured at the same point. The con- 
ventional echo suppressor is a voice-operated switch. The logic and 
control circuit detects the presence of received signal and causes a loss 
of at least 50 dB to be inserted in the path of the echo signal on the 
transmit side. Since the loss would also attenuate the signal from the 
near-end talker, and temporarily make the connection one way, the 
logic and control circuit also detects the presence of double talking and 
puts the suppressor into a “break-in” mode which allows an interrup- 
tion to take place. 


| FROM FAR END | 
_—> 


RECEIVE 










RECEIVED 
SIGNAL. 
LOGIC AND 
CONTROL NEAR END 


4-WIRE 


| 
| 
| 
| 
| 
| 
TRUNK | 
| 
| 
| 
| 
| 


CONTROL TO 
SIGNAL DOUBLE NEAR-END 
TALKING SIGNAL 
DETECTOR PLUS 
ECHO 
SUPPRESSOR 
TRANSMIT tjee 
| TO FAR END 
[ie pee ae tN Pee el 


ECHO SUPPRESSOR 


Fig. 1—One end of a conventional split echo suppressor. 
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Alternatively, we have found that the echo signal can be removed by 
replacing the voice switch with the multiband center-clipping process 
which is mentioned above, and which we have described previously.” 
This configuration is shown in Fig. 2. The outgoing signal from the 
hybrid is divided into a number of contiguous frequency bands by an 
input filter bank, each band is center clipped independently, and then 
the odd harmonic distortion products introduced by the center clippers 
are removed by an output filter bank generally identical to the input 
filter bank. For echo suppression, the center-clipping levels are con- 
trolled by the received signal. This signal is divided into contiguous 
bands by a control filter bank which is identical to the input filter 
bank. The attenuation in each band is adjusted to be equal to or less 
than the trans-hybrid loss in that band so that control signals identical 
to or larger than the filtered echo are obtained. The output of each 
band is peak detected and the detected output sets the clipping level 
in the corresponding center clipper so as to remove the echo signal in 
that particular band. In the absence of received signal, the clipping 
levels are zero. The clipping-level rise-times are comparable to the 
speech bandwidth and should have a hold time greater than the echo 
end-delay which may be up to 25 ms. 

This center-clipping system has several advantages over existing 
echo suppressors of the voice-switching type. Since the frequency 
spectrum is divided into a number of bands, the near-end signal is 
unaffected in bands where there is no energy in the echo signal and 
the echo is completely removed in bands where there is no near-end 
signal component. However, the main advantage appears to come 
from the use of center clipping as opposed to voice switching. Break-in 
of the near-end talker can occur without a double-talking decision, 
even for a return loss approaching 0 dB, and no echo is heard during 
double talking. A comparison of the effect of center clipping and voice 
switching on signals will be discussed in the next section to show 
how these advantages come about. 


II. CENTER CLIPPING AS AN ALTERNATIVE TO VOICE SWITCHING 


The transfer function of the center clipper we will discuss is shown 
in Fig. 3. This center clipper completely eliminates signals below the 
clipping level, but leaves instantaneous signal values greater than the 
clipping level unaffected. In a sense, a center clipper is a voice switch 
operating on the instantaneous amplitude of the signal. However, it 
differs greatly from the process commonly referred to as voice switch- 
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Fig. 2—One end of a split center-clipping echo suppressor. 


ing. As we have mentioned in the preceding section, a large constant 
amount of attenuation (>50 dB) is generally switched into the trans- 
mit path in response to the control signal. In principle a more ideal 
kind of voice switching would be switching of only the amount of 
attenuation required, in addition to existing hybrid return loss, to 


OUTPUT 









A 
\ 


\ 
CLIPPING 
LEVEL 


INPUT 


Fig. 3—Minimum distortion center-clipping transfer function. 
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Fig. 4—Comparison of voice-switching and center-clipping necessary to produce 
50 dB of echo suppression for return losses of: (a) 6 dB, (b) 20 dB, (c) 44 dB. 


reduce the unwanted signal to a tolerable level. It is this kind of voice 
switch which we will compare with the center clipper of Fig. 3. 

For satellite communications connections, a conservative estimate 
is that the echo signal level should be about 50 dB below the level of 
the near-end talker. Consider the situation depicted in Fig. 1, however, 
where the suppressor loss is replaced by either the minimum amount 
of attenuation or center clipping required, and where no double-talking 
detector is provided. The basic difference between these two hypothet- 
ical processing systems is shown in Fig. 4 for three values of return 
loss. The output of the echo suppressor for each case is shown in 
response to a sinusoidal signal, at 0 dBm0, from the near end into the 
echo suppressor. These graphs apply during the hold-over time after 
the voice switching or clipping level has been set by a previously 
received signal of the same transmission level as the near-end signal, 
and where the echo level has decreased to a negligible value. 

In Fig. 4a, for a return loss of 6 dB, the echo signal is at —6 dB 


1624 THE BELL SYSTEM TECHNICAL JOURNAL, MAY-JUNE 1971 


relative to the near-end signal, i.e., at —6 dBm0. Consequently, an 
attenuation of 44 dB has to be switched into the transmit path to 
achieve the desired 50 dB suppression. During the hold-over, this 
would drop the near-end signal by 44 dB. On the other hand, center 
clipping at one-half peak amplitude eliminates the echo and results in 
only 6 percent loss of fundamental signal energy. 

In Fig. 4b, the signals for a return loss of 20 dB are shown. The 
echo signal is at —20 dBm0. Voice switching of 30 dB of attenuation 
reduces the near-end signal to —30 dBm0 while center clipping at 10 
percent of peak, sufficient to remove the echo, produces very little 
distortion of the near-end signal. 

Even when the unwanted signal is —44 dBm0 as in Fig. 4c, 
voice switching of 6 dB is necessary. This reduces the near-end 
signal to half amplitude while the corresponding center clipping at 
1 percent of peak results in negligible effect on the near-end signal. 

It is evident in Fig. 4 that, for reasonable return loss, center clipping 
is a much less severe form of processing than is voice switching, 
especially when narrow-band center clipping is used to avoid harmonic 
distortion products in the output. Because of the relatively slight 
mutilation of the near-end signal by the center clipping, the center 
clippers do not have to be removed during double talking. Thus no 
separate double talking detector has to be used. Echo suppression is 
also quite effective during double talking and will be discussed in 
more detail in Section V. 


III. SIMULATION AND IMPLEMENTATION 


Initially, we simulated the center-clipping echo suppressor on a CDC 
3300—EAI 8800 hybrid computer. Double talking was simulated with 
return losses of 15 and 30 dB and the output of the center-clipping 
process was recorded for each condition. No echo was heard in either 
case. For 15 dB return loss, a small amount of degradation of the 
near-end speech was noticeable after processing. For 30 dB return loss, 
negligible degradation of the near-end speech resulted from the cen- 
ter clipping. 

In order to study the center-clipping process under actual conditions 
of double talking, we needed a real-time processing system. The 
required center clippers and control circuits for the clipping levels 
were designed and built using analogue components. However, the 
clippers used were not the minimum distortion form shown in Fig. 3, 
but the somewhat less efficient form of Fig. 5.5 The peak detectors had 
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switchable decay times of 0 ms for alignment and 10 ms for use 
during echo suppression. Three General Radio (GR) Model 1925 
filter banks composed of 1/3-octave 6th-order Butterworth filters 
were used to complete the center-clipping system. 

We investigated the center-clipping system as an echo suppressor in 
a simulated toll circuit designed for evaluation of echo suppressors. 
Figure 6 is a simplified diagram of one end of the circuit. This circuit 
connects two 4-wire telephones, with active sidetone, via a 4-wire delay 
path. Hybrids are simulated by echo paths in which return loss can 
be set from 0 to 50 dB. Selection of various echo suppressors or a 
4-wire line is provided between the two echo paths and the 4-wire 
network. For comparison, we had available the center-clipping echo 
suppressor (one end of a split system), a split 3A echo suppressor with 
speech compression, and a 4-wire connection. All systems were 
lowpass-filtered at 3200 Hz. The 3A units are echo suppressors employ- 
ing voice switching, currently in use in the telephone plant. We also 
had available about 0.6 second of tape delay, which was introduced as 
shown in Fig. 6, for simulation of a satellite connection. 

The control circuit attenuators in the center-clipping system (Fig. 
2) were adjusted manually so that echoes of far-end sinusoidal signals 
were completely eliminated in each band for the selected return loss. 
This initial adjustment resulted in no echo being heard during single 
talking. 


IV. RESULTS 


Evaluation of the performance of an echo suppressor is a difficult 
task because most meaningful testing has to be done during normal 
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Fig. 5—Center-clipping transfer function implemented in analogue circuits. 
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Fig. 6—One end of simulated toll circuit for testing echo suppressors. 


conversations. No attempt at a quantitative evaluation of the center- 
clipping echo suppressor has as yet been carried out. However, in 
this section we present results of informal demonstrations using the 
simulated toll circuit. 

The system initially used had six 2/3-octave bands. It performed 
very well in suppressing echoes in that no echo was heard by the 
far-end talker, even during double talking, for return losses down to 
0 dB. However, even during single talking from the near end, some 
degradation was unexpectedly still present. This was due to a combi- 
nation of phase distortion and coloration caused by passing the 
speech through two of the GR filter banks before recombining the 
bands. Each of the filter banks has a spectral ripple which is about 
+ 1 dB. However, the spectral ripple is several dB for two filter 
banks in series. In addition, phase distortion, which is not serious in 
one filter bank, is doubled for two filter banks and becomes objection- 
able. Because the phase delays correspond to those of the 1/3-octave 
filters combined to make 2/3-octave bands, the distortion is greater 
than was present in the original computer simulation. 

In order to improve the speech quality in single talking, we substi- 
tuted GR 1-octave filters, center frequencies 250, 500, 1000, and 
2000 Hz, for the four lowest filters and used a 1/3-octave filter, center 
frequency 3150 Hz, at the top of the frequency band to make a 
5-channel system. This system covered the same total bandwidth as the 
6-channel system but had less phase distortion because of the wider 
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filters used. Its performance is expected to be nearly identical to that 
of a 4-channel system since the same bandwidth could be covered by 
4 filters, each only slightly wider than one octave. 

The 5-channel system was as effective in suppressing echoes as the 
6-channe] system. As expected, the speech quality for single talking 
from the near end was improved but was still slightly degraded by 
coloration. As a result, we found that we could get better quality dur- 
ing single talking by removing the output filter bank. Because there 
is no clipping during single talking, output filters are unnecessary for 
this condition since no distortion products are generated. Surprisingly, 
however, distortion of the near-end speech during double talking was 
not very noticeable to the far-end talker who was simultaneously 
talking and listening. This was apparently due to masking. 

We have demonstrated the systems to numerous people in different 
areas of Bell Laboratories. In these demonstrations, the speech quality 
of the center-clipping system was judged to be comparable to the 
simulated 4-wire satellite connection (or the 3A echo suppressors) for 
single talking conditions. When a comparison was made between the 
center-clipping system and the 3A echo suppressors during double 
talking, they differed in two respects. First, noticeable echo could be 
heard during double talking with the 3A echo suppressors since the 
3A’s offer little echo suppression in the break-in mode, while no echo 
was heard during double talking with the center-clipping echo sup- 
pressor. Second, the 3A’s gave a chopped quality to the speech ap- 
parently independent of the return loss, as they switched between 
suppression and break-in, while this kind of switching sound was 
absent from the center-clipping system. (In the break-in mode of the 
3A’s during double talking, a variable amount of loss is introduced 
into the receive paths depending on the relative and absolute levels 
of the two end signals.) With the control circuits adjusted for return 
losses less than about 15 dB, the center-clipping system contributed 
some distortion to the speech during double talking which became 
more noticeable as the return loss was decreased to 0 dB. However, 
for return losses greater than 15 dB, the center-clipping system was 
almost indistinguishable from a 4-wire connection with no echo path. 


V. DISCUSSION 


The center-clipping process is a unique echo suppressor in that no 
decision between single talking and double talking has to be made. 
It is obvious how it operates under single-talking conditions. In single 
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talking from the far end, the clipping levels are set with a rise time 
faster than any speech component so as just to remove the echo in each 
band. When the received signal ceases, the clipping levels fall to zero 
with a holding time greater than the end delay. For single talking from 
the near end, the clipping levels are zero and the speech is, in principle, 
unaffected. . 

It is not so apparent how echo is eliminated during double talking. © 
In this case, the echo signal is added to the near-end signal and this 
composite signal is fed to the input filter bank. The clipping levels 
still follow the echo signal, and eliminate echo in bands where the 
two signals do not overlap and during gaps between words and 
sentences in the near-end speech. When energy from both signals 
appears in any band, clipping cannot remove the echo signal. How- 
ever, it appears that, in this case, the echo is partially masked in 
that band. For these reasons, it is probably advantageous to have 
the bandwidths of the channels as small as possible compatible with 
other system requirements of speech quality and cost. These consid- 
erations indicate that the minimum number of channels possible may 
be determined by the effectiveness in echo suppression rather than 
by the avoidance of harmonic distortion. That is, a 38-channel systém 
may not perform as well in echo suppression even though there is no 
harmonic distortion at the output. (A 3-channel system with band- 
widths of individual filters just under two octaves includes no har- 
monic distortion products in the output since only odd-harmonic 
distortion products are produced by the center clippers). So far a 
3-channel system has not been investigated. 

In the demonstrations described, control signal levels were adjusted 
manually to match the trans-hybrid loss. In practice, this setting 
should either be permanently adjusted for worst case or adaptively 
controlled. If a center-clipping system is used as a back-up for an 
echo canceller,® worst-case setting will still yield almost perfect 
results. However, in the normal network, where 6 dB return loss is 
the worst case, adaptive setting, even if quite crude, would be 
desirable. 

As mentioned in the preceding sections, several kinds of speech 
degradation occur in the center-clipping echo suppressor. Inherent. in 
the process is the degradation observed in the computer simulation 
where coloration and phase distortion of the filters and nonlinear 
distortion of the center clippers were minimized. In this case, degrada- 
tion resulted mainly from loss of part of the signal caused by the 
center-clipping process. However, considerable loss of information can 


ECHO SUPPRESSOR 1629 


be tolerated without significant decrease in subjective quality because 
of the redundant nature of speech. In the analogue experiments, other 
distortions were present in addition to this inherent one. Because of 
this, optimum operation was realized with the output filter bank 
removed even though nonlinear distortion was present during double 
talking. 

So far, all the discussion of evaluation of center clipping echo 
suppression has been for the case of a 0.6-second transmission delay. 
For shorter delays, the subjective effect of the degradations present 
during double talking is greatly reduced and speech of comparable 
quality is obtained for smaller return losses. 


VI. CONCLUDING REMARKS 


We have described demonstrations of an experimental center- 
clipping system for electrical echo suppression. This echo-suppressor 
principle is unique in that no double-talking decision has to be made, 
Echoes appear to be completely removed, even during double talking, 
for return losses as small as 0 dB. Speech communication is full- 
duplex at all times and, for return losses greater than about 15 dB, 
is almost indistinguishable from a 4-wire connection. The center- 
clipping echo suppressor would appear to be an excellent back-up for 
an echo canceller if the echo cancellation plus return loss reduces 
the echo level to —20 dBm0 or less. 

We have also made tests of this echo suppressor using a “real” end 
section including an N8-carrier, 4-1/2 miles of simulated loaded cable, 
and a real telephone and hybrid. The results were similar to those 
already discussed when the attenuation in each band was manually 
adjusted to match the return loss characteristics of the carrier system. 
(Return loss varied from about 6 to 18 dB.) 

In another experimental application, we have used the center- 
clipping system as a replacement for voice switching in the suppres- 
sion of acoustical echo generated in an idealized 4-wire speakerphone. 
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Low-Loss Modes in Dielectric 


Lined Waveguide 


By J. W. CARLIN and P. D’AGOSTINO 
(Manuscript received December 15, 1970) 


Recent studies of the heat loss characteristics of the normal modes in 
dielectric lined circular waveguide have shown. that modes other than 
those of the circular electric type may have low loss over wide fre- 
quency bands. This unexpected behavior of the mode loss characteris- 
tics is explained by utilzing the well-known duality relationships be- 
tween the electric and magnetic fields. Specifically, it is shown that 
the lowest loss modes are alternately circular electric and circular 
magnetic as frequency (or lining thickness) increases, with low loss 
occurring at frequencies and lining thickness where the wall impedance 
of the dielectric coated guide approximates a short circuit (or electric 
wall) for circular electric modes, and an open circuit (or magnetic 
wall) for circular magnetic modes. 

These findings will influence and aid in the selection and. design. of 
an appropriate waveguide(s) (employing the circular electric TE, 
mode) for the WTS millimeter wave transmission system which. is 
presently under development; they may also influence the design of 
future guided wave systems. 


I. INTRODUCTION 


The possibility of using a circular electric (TE :) mode in circular 
waveguide has been of considerable interest since the initial discovery 
of its desirable low-loss characteristics (in the following, we are con- 
cerned only with the heat loss in the guide). One type of guide cur- 
rently under consideration (Fig. 1) consists of a highly conducting 
outer wall to which a thin dielectric liner is bonded to break the 
degeneracy in phase velocities for the TE»; and TM; modes in 
metallic circular waveguide. 

The circular electric mode loss characteristics of thinly lined circular 
waveguide have been determined by H. G. Unger.-* This work has 
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Fig. 1—Dielectric lined circular waveguide. 


since been extended* and it was found that other modes also have very 
low heat loss over an appreciable frequency range. A full scale discus- 
sion of the analysis is beyond the scope of this paper but will be avail- 
able in a forthcoming paper.* In the following, the analytical ap- 
proach is indicated and some typical results for the heat loss (copper 
and dielectric loss) of the TEo;, TE 2, and TMos modes in lined 
circular guide are given. 


II. DISCUSSION 


The problem of obtaining the normal mode loss for a perfectly 
straight circular waveguide with a uniform lining, as shown in Fig. 1, 
was approached in two ways. In one approach, the well-known induced 
current method was used. The field and wall currents in the guide were 
found for a lossless structure; they were then used to determine the 
heat losses of the waveguide. In the second approach, the impedance 
at the dielectric-free-space interface was prescribed as a boundary 
condition for the solution of the wave equation. This impedance was 
established by using a transmission line model to transform the sur- 
face impedance of the copper conducting wall through the dielectric 
lining, which was assumed to have a small but finite loss tangent. The 
complex eigenvalue equation was then solved and the overall losses 
thus determined. 

The results of these two methods are in good agreement. The wall 
impedance approach aids in understanding the physical phenomena 
occurring in dielectric lined guide and is used in the following para- 
graphs to explain the loss characteristics of circular electric and mag- 
netic modes in dielectric lined guide. 
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In Fig. 2, we have plotted the total ‘heat loss for some circular 
electric and magnetic modes in lined waveguide with a 1-percent 
lining (p = 1.01). Here p is defined as the ratio of the waveguide con- 
ductor radius to dielectric radius. The losses of the circular electric 
modes initially decrease with increasing frequency, as in unlined 
waveguide, and reach minimum values at approximately 140 GHz (at 
this point, the dielectric loss accounts for 8 percent of the total heat 
loss). The TEpon losses then increase rapidly with a further increase 
in frequency. We observe an interesting phenomenon at approximately 
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Fig. 2—Loss characteristics of circular symmetric modes in lined waveguide 
(wall impedance model). 


240 GHz. At this frequency the lining is approximately a quarter 
wavelength thick (Araa/4) in the radial direction. The equivalent 
wavelength (Araa)* in the radial direction for the dielectric region is 
related to the free-space wavelength (A) by 


Nia = A/VWVe— 1. (1) 
As the frequency increases beyond 240 GHz, the TEo; loss increases 
indefinitely. This is attributable to a surface-wave phenomenon (the 


*In highly overmoded guide, it can be shown that the radial propagation con- 


stant in the dielectric region is (27/\)\/e — 1 by considering plane-wave reflection 
by a grounded slab at grazing incidence. 
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field is bound to the dielectric region), and the fact that it occurs 
only when the lining thickness is greater than a quarter wavelength is 
in agreement with the minimum thickness (n = 1) required to prop- 
agate a TE surface wave on a grounded slab:°® 


Pi ie acl 
me AAS ae 

The TEo2 loss (20 percent of which is due to dielectric losses in the 
lining at this frequency), conversely, decreases as the frequency 
becomes greater than 240 GHz and reaches a minimum when the 
lining is a half wavelength thick (A;aa/2), as shown in Fig. 2. We 
would find the TE. loss increasing rapidly with a further increase in 
frequency as it eventually propagates as a surface wave when the 
lining is three-quarters of a wavelength [corresponding to n = 3 in 
equation (2) ] thick. 

The most interesting feature of Fig. 2 is the steadily decreasing 
loss of the TMoz mode as the frequency increases. At the upper end 
of the proposed WTS frequency band (110 GHz), the TMoe loss (10 
percent is due to dielectric loss) is a surprisingly low 4.7 dB/km for 
this example. The TM», mode also has lower loss than any circular 
electric mode over the frequency range of 180-350 GHz. At 240 GHz, 
the TMo2 loss is 0.54 dB/km and dielectric losses account for 13 
percent of this. 

From Fig. 2 we observe that the TMo, loss characteristics for a 
lining of thickness ¢ are similar to those of a TEon mode in a guide 
with a lining thickness a quarter wavelength greater. The TMo, modes 
have loss minima for linings Ayna/4, 3 Araa/4, *** thick while the TEon 
modes have loss minima for 0, Araa/2, +++ thick linings. The minimum 
dielectric thickness required for a TMy, mode to propagate as a sur- 
face wave is° 


n=1,3,5,-"-. (2) 


ae n=0,2,4,---, (3) 
e—1 

which differs from the minimum thickness for a TEon surface-wave 
mode in (2) by a quarter wavelength. From (3) we see that the TMo, 
mode propagates as a surface wave for very thin linings (n = 0); the 
TMo; loss curve was not shown in Fig. 2, since the loss steadily 
increased with increasing frequency from an initial value of 47 dB/km 
at 40 GHz. 

The preceding loss characteristics can be explained by a simple 
physical argument and the use of duality. Let us consider Fig. 3. Here 
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we have sketched the field distributions of the circular electric and 
circular magnetic modes for no lining, a quarter-wave lining, and a 
half-wave lining. For the TM,y, and TMo. modes with no lining (Fig. 
3a and c), we have a strong normal electric field at the wall of the 
guide with a high induced electric charge density and hence high loss. 
For the TE); and TEy, modes (Fig. 3b and d) the losses are quite low. 

For a quarter-wave lining, the impedance at the diclectric inner face 
is approximately an open circuit or magnetic wall. The TM; mode is 
trapped in the lining and decays in an exponential fashion towards 
the center of the guide (Fig. 3e). The field configuration for the TE»; 
in the air region (Fig. 3f) is the dual of that for the TMo, mode in 
Fig. 3a. Hence it will have high losses. The field configuration for the 
TMo2 mode (Fig. 3g) is now the dual of the TE»; mode in Fig. 3b and 
hence it is a low-loss mode. On a further increase in the lining thick- 
ness tO Araa/2, we find THe; and TM,y; both propagate as a surface 
wave bound to the lining (Fig. 31 and j), while TMos has high loss 
(Fig. 3k). TEo2 (Fig. 31) and TEo3 (not shown in Fig. 3), conversely, 
have low losses for this lining thickness. 

In Fig. 4, we have plotted the loss for several TE», and TMo» modes 
at 100 GHz for lining thickness up to 5 percent. We find the results 
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Fig. 4—Loss characteristics of circular symmetric modes at 100 GHz for wave- 
guide lining thicknesses up to 5 percent (wall impedance model of lined wave- 
guide). 
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for a change in lining thickness are similar to those for a change in 
frequency. The TE», and TMy, modes go through relative loss maxima 
and minima for a Araa/4 change in lining thickness. An additional 
mode also becomes trapped and propagates as a surface wave for 
every Araa/4 change in thickness. 

In Fig. 5, some representative solutions of the appropriate eigen- 
value equations for TEon and TMoy, modes. in dielectric lined guide 
are given. The eigenvalue (k,,) is defined as 


kn = Xna 


where a is the dielectric radius and x, the radial propagation constant 
in the air region. The eigenvalues for liners which are integral mul- 
tiples of a quarter wavelength thick in the radial direction are approx- 
imately the same as the eigenvalues for circular symmetric modes in 
empty guide with an electric or magnetic wall. The eigenvalue also 
tends to zero as the thickness increases and eventually becomes pure 
imaginary which is indicative of a surface-wave prcsoncnen with 
very large heat losses. 


IiI. CONCLUSION 


In the preceding sections we have seen that. modes other than those 
of the circular electric type have low loss in dielectric lined guide. In 
order to transmit a circular electric (not necessarily TEo,;) mode with 
low loss, the lining must be significantly less than a quarter wavelength 
or approximately an integral multiple of a half wavelength thick. On 
the other hand, it is possible to use a dielectric lined guide with a 
quarter-wavelength-thick liner (also 3A/4, 54/4, etc.) as a low-loss 
circular magnetic mode transmission medium. The tolerances on such a 
system for mode conversion loss would be similar to those on the ap- 
propriate dual circular electric guide. 

The results also indicate there is a range of thicknesses or frequencies 
for which both TMon and TE», modes have low loss. Since the local 
character of the fields for any mode near the wall of a metallic wave- 
guide must be similar to that of either a TEy, or TMon mode, there 
will be a range of thicknesses and frequencies for which many quasi 
TM and TE modes have low loss (on the order of 6 dB/km or less) 
in dielectric lined guide. (This has been confirmed by recent results.*) 
This implies that the ohmic losses in route bends will be reduced. 
Further, since the dielectric liner not only reduces the heat loss for 
the spurious quasi TE and TM modes generated by a route bend or 
other guide deformation but also alters their field distributions relative 
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Fig. 5—Eigenvalues for circular symmetric modes in lined waveguide. 


to those of unlined waveguide, it will be necessary to design mode 
filters with this in mind. 
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A Relation for the Loss Characteristics 
of Circular Electric and Magnetic Modes 
in Dielectric Lined Waveguide 


By J. W. CARLIN 
(Manuscript received December 28, 1970) 


Recent studies have shown that modes not of the circular electric 
type have low-loss characteristics in dielectric lined circular waveguide. 
It was determined that circular electric modes are low-loss for linings 
approximately 0, d/2, 2, +++ wavelengths thick while circular magnetic 
modes are low-loss for d/4, 3/4, +++ thick linings. In this paper 
we derive a simple relationship between the loss characteristics 
of circular electric waves with 0, r/2, +++ thick linings and cir- 
cular magnetic waves with r/4, 3/4, +++ thick linings. 

Specifically, we show that the minimum obtainable circular magnetic 
mode loss is at least four tumes greater than the minimum obtainable 
circular electric mode loss. We also show that the minimum loss for 
successively higher order circular electric (magnetic) modes corre- 
sponding to approximately 0, »/2, +++ (d/4, 8d/4, +++) thick linings ts 
approximately the same if we neglect the dielectric losses. 


I. INTRODUCTION 


Recent studiest indicate many modes not of the circular electric 
type may have very low loss in dielectric lined circular waveguide. 
In these studies the duality principle was used to explain the low- 
loss characteristics of circular magnetic modes for linings having 
thicknesses which are an odd multiple of a quarter wavelength. 

In this paper we extend this use of the duality principle and derive 
a simple relation between the loss characteristics of circular electric 
and magnetic modes in dielectric lined circular waveguide. In all cases, 
we find the minimum heat loss obtainable as the dielectric thickness 
varies is greater for circular magnetic modes in comparison with the 
minimum circular electric mode heat loss. 
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II. DISCUSSION 


The waveguide under consideration is shown in Fig. 1. It consists of 
a highly conducting outer wall to which a thin layer of dielectric 
(of relative permittivity «) is bonded. The liner is usually “electri- 
cally” thin and its sole function is to break the phase velocity 
degeneracy between the TE); and TM, modes in hollow metal-walled 
waveguide. In this paper we are concerned with the effect of 
thicker linings (lmings which are an integral multiple of a quarter 
wavelength thick) on the conducting wall losses of the waveguide. We 
will assume that the dielectric is lossless in this study. 

The metal walls of the waveguide in Fig. la may be modeled’ as a 
low-impedance termination (Z,  ) for the fields interior to the walls. 
Here » = V Mo/€) 18 the characteristic impedance of the interior filler 
(free space) in the guide. The dielectric liner is equivalent to a short 
section of transmission line. This transmission line has a characteristic 
impedance* 


Zoe = n/Ve—1 (1) 
for the electric field polarized parallel to the wall and a characteristic 
impedance 
1 


Lia (2) 
for the magnetic field polarized parallel to the wall. The appropriate 


transmission line propagation constant is 


Be =kVe— I], (3) 
where ko is the free-space propagation constant. The equivalent im- 
pedance conditions (Z, and Z,) at the inner face of the dielectric in 
Fig. 1b may be obtained from the transmission line parameters in (1), 
(2), and (3) in the usual manner. The wall impedance guide in Fig. 
1b is equivalent to the lined guide in Fig. la and may be used to 
predict its electrical properties with a small error.® 


R. E. Collin* has shown that if the fields #,, H, are solutions of 
Maxwell’s equations in a source-free region of free space, the dual 
fields nH, , —(1/n)E,2 are also a solution. The same transformation is 
applicable to the wall impedance guide in Fig. 1b but we must also 
transform the wall impedances. The appropriate dual is shown in 


*The expressions in (1), (2), and (3) were derived from the plane-wave 
scattering at grazing incidence by a grounded slab. 


WAVEGUIDE LOSS CHARACTERISTICS 1641 





raed --~-2Z¢, 27 
Egy, _ E 
A, |b= Ze Ae la= Z¢ 
Ez), | E 
aye zy nig [8722 
(a) (b) 


Fig. 1—(a) Dielectric lined waveguide. (b) Equivalent wall impedance wave- 
guide. 


Fig. 2. The fields are transformed as in Collin and the impedance wall 
is replaced by an equivalent admittance wall. We may summarize the 
duality principle for a guide of radius a, as in Fig. 2, as follows: 
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From (4) we see that the dual of a circular electric mode in a guide with 
a low-impedance wall (Z,, < 7) is a circular magnetic mode in a guide 
with a high-impedance or low-admittance (Yo, = (1/n’)Z1g K 1/n) wall. 
Since the # X H product is invariant under the above transformation, 
the loss characteristics of the two duals are identical. 

Using the duality relations in (4), we can now obtain a simple relation 
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Fig. 2—Dual waveguides. 
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between the losses of circular electric and magnetic modes in dielectric 
lined guide as shown in Fig. 3. We first consider the loss for a TEo, 
mode in I’ig. 3a. Since there is no lining at r = a, we have Z,, = Z, and 
the loss a7”*' for a TE,; mode in hollow guide is thus’ 


as? = C Re (Z,). (5) 
We now consider the loss for a TMoz mode in Fig. 3b. The input ad- 
mittance Yo, at r = a is easily determined for a quarter-wave lining 
from the transmission line impedance in (2) as 


Sg Lee, 
Mae > ie) ” 


But this circular magnetic mode will have the same loss [see equation 
(4) ] as a circular electric mode in a guide with wall impedance 


2 
€ 


f= 1) 
We see that the equivalent wall impedance for the circular electric 
mode dual is higher than the wall impedance of the unlined wave- 
guide. Hence, the loss for the TMo2 mode in the guide shown in Fig. 
3b is greater than that for the TE»; mode in Fig. 3a and is given by 


L196 = Zi . 


2 
€ 


e—1l 


TMoo _ 
H/4 = C 





Re Z, . (7) 


If we now consider the TEo2 mode in Fig. 3c we see it has the same 
fields for r < a as the TEo; mode in Fig. 3a and, hence, its loss will be 
the same. We, thus, have 


axj2"? = C Re (Z:). (8) 


a4) ——-ZLy gore 


S|> 
t|> 





(a) (b) (c) 


Fig. 3—Low-loss dielectric waveguides: (a) TE: mode; (b) TMe mode; 
(c) TE mode. 
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In the above analysis we have neglected the power carried in the 
dielectric regions of the guide. This is a reasonable approximation if 
the lining thickness is much less than the guide radius. We have also 
neglected the dielectric losses. Recent computer-generated results’ in- 
dicate that this is a reasonable assumption since a quarter-wave poly- 
ethyline liner (e€ = 2.34, tan 6 = 83 X 107°) in 50mm-diameter circular 
guide at 100 GHz has a metal-wall loss for Cu walls of 1.79 dB/km, 
while the dielectric loss is 0.48 dB/km for the TMo2 mode. 

The present study shows that the minimum circular magnetic mode 
heat loss a7¥* and the minimum circular electric mode heat loss a7”? 
are related by 


Cmin = @/e — | (9) 


in dielectric lined guide. This ratio is a minimum at « = 2 and has a 
value of 4. Relation (9) was found to agree well with computer- 
generated results for the modal* heat loss in dielectrie lined guide. 


III. CONCLUSION 


We have seen that circular magnetic modes have a higher minimum 
heat loss than circular electric modes in dielectric lmed waveguide. 
The ratio of the two losses was shown to be a simple function of the 
linings relative permittivity. The ratio was shown to have a minimum 
value of 4 for a lossless dielectric of permittivity 2. 

The results indicate a TMo mode has possibilities as a long-haul 
carrier, in dielectric lined guide. The heat loss, however, is always at 
least four times greater than that for a comparable TE,» mode. The 
mode conversion loss for the two systems also can be shown to be 
comparable by use of the duality principle.® 
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Timing Recovery in PAM Systems 


By R. D. GITLIN and J. SALZ 
(Manuscript received January 7, 1971) 


It is shown how various timing recovery schemes are reasonable 
approximations of the maximum likelihood strategy for estimating an 
unknown timing parameter in additive white gaussian noise. These 
schemes derive an appropriate error signal from the received data 
which ts then used in a closed-loop system to change the timing phase 
of a voltage-controlled oscillator. The technique of stochastic approxi- 
mation is utilized to cast the synchronization problem as a regression 
problem and to develop an estimation algorithm which rapidly con- 
verges to the desired sampling time. This estimate does not depend 
upon knowledge of the system impulse response, is independent of 
the noise distribution, is computed in real time, and can be synthesized 
as a feedback structure. As 1s characteristic of stochastic approxima- 
tion algorithms, the current estimate is the sum of the previous estt- 
mate and a time-varying weighted approximation of the estimation 
error. The error ts approximated by sampling the derivative of the 
received signal, and the mean-square error of the resulting estimate 
ws minimized by optimizing the choice of the gain sequence. 

If the receiver is provided with an ideal reference (or if the data 
error rate ts small) it is shown that both the bias and the jitter (mean- 
square error) of the estimator approach zero as the number of itera- 
tions becomes large. The rate of convergence of the algorithm 1s 
derived and examples are provided which indicate that reliable 
synchronization information. can be quickly acquired. 


I. INTRODUCTION 


The problem of symbol synchronization in digital data transmission 
in the presence of intersymbol interference is extremely complicated. 
The best sampling instants are channel dependent and are in general 
difficult to determine. Consequently, the problem of timing recovery in 
high-speed data transmission is intimately tied in with adaptive 
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equalization. Since general methods for simultaneous optimum deter- 
mination of the receiver parameters are not known, these parameters 
are independently determined. 

Timing information is usually obtained directly from the data wave 
in a variety of ways. Our objectives in this paper are: 


(i) To indicate the optimum method (maximum likelihood) for 
estimating an unknown timing parameter from random data 
for a certain class of PAM data transmission systems; 

(27) To show that a variety of timing recovery methods currently 
in use are reasonable approximations of the optimum method, 
and to note that the generation of an error signal from the 
received signal is a feature common to these methods; 

(itt) To demonstrate that timing recovery dynamics can often be 
studied and controlled through the application of stochastic 
approximation theory.*~° 


Identifying the desired timing parameter as the solution of a re- 
gression equation will allow us to apply stochastic approximation 
theory to the symbol synchronization problem. For purposes of ilus- 
tration we analyze a stochastic approximation timing recovery pro- 
cedure for square-wave modulation. For this example we derive 
asymptotic formulas for the probability of error as a function of. 
signal-to-noise ratio and the number of iterations used in the timing 
recovery loop. Since the number of iterations is directly proportional 
to the number of signaling intervals, insight is provided into the setup 
time required to achieve reliable symbol synchronization. 

We finally focus on the more difficult problem of timing recovery 
in bandlimited PAM systems. Here timing information must be ob- 
tained in the presence of intersymbol interference as well as additive 
noise. A stochastic approximation algorithm is presented which derives 
symbol synchronization (i.e., estimates the desired sampling time) 
from the received data in a quick and accurate manner. The estima- 
tion algorithm developed does not require explicit knowledge of the 
system impulse response or the noise distribution. If the impulse 
response of the channel satisfies certain conditions, then the algorithm 
will converge in mean-square provided the gain sequence is properly 
chosen. Symbol synchronization is obtained by adjusting the sampling 
time in the following manner: at the end of each symbol interval the 
current estimate is taken to be the sum of the previous estimate and a 
weighted approximation to the actual estimation error. The desired 
sampling time is assumed to be that instant when the system impulse 
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response is a maximum. For this sampling time it is shown that a 
reasonable approximation to the estimation error is the sampled 
derivative of the received signal.t When the error is small, its evolu- 
tion can be described by a first-order random difference equation. At 
every iteration the mean-square error (mse) can be minimized by 
optimizing the choice of the (time-varying) weighting sequence. The 
optimum weighting sequence is of the form 1/(a@ + @n), where a and 
8 are quantities which depend on the system impulse response and 
noise power, and n is the discrete time index. Since a and 8 are gen- 
erally unknown at the receiver they may either be estimated (giving 
rise to an adaptive synchronization algorithm) or picked arbitrarily. 
In an effort to overcome the lack of knowledge of « and 8 (in addition 
to simplifying the algorithm) it is tempting to use the asymptotic form 
of the gain c/n, where c is a constant. However, if 8 « a then the 
optimum gain is essentially a constant (1/a) for many iterations, and 
for a wide range of c the estimate obtained using c/n is shown to be 
unreliable. Hence it appears that in order to obtain satisfactory per- 
formance some adaptivity to determine a and @ should be used in any 
realization of the algorithm. 

Under the assumptions that the receiver error rate is small (so that 
an ideal reference can be assumed) and that the “eye” of the dif- 
ferentiated impulse response is open, the optimum mse is asymptotically 
of the form 1/pn, where p is a “signal-to-noise” ratio. The “signal” 
term is the value of the slope of the differentiated impulse response 
near the origin, the “noise” term is the sum of the actual noise vari- 
ance and two intersymbol interference type terms. Thus the mse can 
be driven to zero and an example is given to illustrate how an accurate 
estimate can be obtained in a few signaling intervals. We show that for 
a sin x/x impulse response, ten iterations will drive the mean-square 
error to less than 0.01 of a signaling interval. ; 

In Section II we determine the maximum likelihood estimate of 
an unknown timing parameter for a baseband PAM data signal which 
has been contaminated by white gaussian noise. Several approxima- 
tions to the optimum estimator are described in Section IIJ. The 
theory of stochastic approximation is introduced in Section IV, and is 
used both to cast the synchronization problem as a regression problem, 


+B. R. Saltzberg? has suggested a technique for timing recovery which uses 
this approximation. His investigation is restricted to algorithms which can be 
realized using time-invariant devices. The algorithm we develop exploits the 
advantages of using time-varying elements. 
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and to analyze and control the dynamics of timing recovery. In Sec- 
tion V we discuss a timing recovery algorithm for bandlimited PAM. 


II. THE MAXIMUM LIKELIHOOD ESTIMATOR OF AN UNKNOWN TIMING 
PARAMETER 


Consider the L level data wave in additive white gaussian noise v(t) 
of double-sided spectral density No , 


Vit) = do a,h(t — nT — 7*) + r(d), (1) 


where {a,} are the data symbols taking on values +d, +8d, ::- 
+ (LZ — 1)d with equal probability, h(t) is a bandlimited pulse whose 
peak value occurs at r*, and —7'/2 S +* S T/2 is an unknown timing 
parameter.t 

Detection of the data symbols {a,} is usually accomplished by first 
suitably filtering V(t) and then sampling the output at time instants 
r+ kT, k = +1, +2, ---: . The resulting error rate is a function of r 
in addition to other parameters. An ideal timing recovery system 
would supply the detector with + which minimizes the probability of 
error. While this problem is conceptually straightforward, it is not 
analytically tractable and the structure of such an optimum timing 
recovery system is not generally yet known. We therefore must resort 
to a less utopian criterion. 

Much simpler evaluation functions often used in data transmission® 
are 


1 . 

. _ kh = ——__—___—. — * oo 1 

Dir — 1°) = Gay 2, | hr — * — kT) | 
k#0 


g=1 or 2. (2) 
Even for these relatively simple evaluation functions it is generally 
difficult to find the optimum +. R. W. Chang® derives timing recovery 
procedures based on minimizing a particular version of equation (2). 
However, for a certain class of linear distortions, namely the type that 
gives rise to symmetrical pulse shapes, the best 7, which minimizes 
(2), is equal to the unknown parameter +*. For this class of channels 
the problem of optimal timing recovery procedures can be cast in the 
language of statistical estimation theory. This is the situation treated 
in this section. 


t We assume throughout that r* is mdependent of time. 
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The statistical problem we pose is this: determine an estimation 
procedure for the parameter + based on observations made on the 
received signal V(t) [equation (1)|. The more detailed question we 
wish to answer is the following. How should the observed signal, say 
for T’, seconds, be processed such that a “good” estimate of 7* is 
obtained? The answer of course depends on what one means by good. 
A reasonable measure of goodness is to require that the estimate 
maximize the likelihood function of the unknown parameter. For 
binary transmission this is a classical problem for which a solution 
is known. (See for example Ref. 3, 10, and 11.)+ The extension to 
multilevel signaling is straightforward and we now briefly sketch the 
derivation. The likelihood function of the received signal is propor- 
tional to (superfluous constants are omitted) 


LV] ~ Blexp _ SW [ [Vit) — s(t; r)} at, (3) 


where s(t; 7) = >> a,h(t — nT — 7) and E{-}, denotes expectation 
with respect to the data symbols. The expectation indicated in (2) 
can be carried out provided the reasonable assumption is made that 
the power in the data signal s(¢; 7) when measured over an interval 
[0, 7] (large compared with a symbol duration) is indepen:lent of the 
data sequence and the unknown parameter 7. This assumption leads 
to a simplified version of (8) 


LV] ~ BY exp (i i; ” Wis(ts 2) a\\ (4) 
L(V) ~ i 2 2 cosh (i 2())} ; (5) 

where 
2,(T) = i V(OACG — nT — 7) dt (6) 


is recognized as the sampled (at times nZ7’ + +r) output of a filter 
matched to h(t), whose input is V(t). 

The maximum likelihood estimate (MLE) is obtained by differ- 
entiating L[V] with respect to r+ and setting the resulting expression 
to zero. An equivalent strategy may be obtained by differentiating any 
monotonic function of L and a convenient such function in this appli- 


+ None of the references cited claims originality. It is difficult to determine 
where the result was written down first. 
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cation is the logarithmic function. From equation (5) 


A[V] = Nn L[V] ~ > {in |x cosh (2 20) |, (7a) 


and upon differentiation we obtain 


aA _ [= (2k — 1) sinh (Pha be Zi )} 





=> en i d_ dz,(r) (7b) 
or == pal N, adr 
2, 00 cn (CEE DE ue z(r)) | 
where the bracketed term can be shown to be? 
(ZL — 1) sinh (etve z(t) — (ZL + 1) sinh (Zand as(r)) 
: ; (70) 


cosh (Ztve aed 2nl *) — cosh (G_ na FO x *) 


and for the typical a communication environment of a large signal- 
to-noise ratio the above expression becomes proportional to 


i ~ 1) tanh (LE D4, (9). 


Thus we finally have that 


The optimum estimation strategy is exhibited in equation (8). The best 
value of r (ie., the MLE) makes the right-hand side of equation (8) 
as small as possible. The mathematical operations exhibited in equa- 
tion (8) can readily be instrumented. The implementation objective 
would be to use the right-hand side of (8) as an error signal in a 
closed-loop system that iteratively adjusts 7 to determine the MLE. 
A block diagram of this implementation is shown in Fig. 1. The re- 
ceived signal and its derivative are first passed through filters with 
identical impulse responses h(—t) whose outputs are periodically 
sampled at times nT’ + 7. In the undifferentiated branch, the samples 
are first multiplied by (Z + 1)d/Np and are then passed through the 
memoryless nonlinearity tanh (-) which resembles an infinite clipper 
for large input values. The output from the two branches are mul- 


+ Note that for L = 2, equation (7c) becomes (sinh 3y — 3sinh y)/(cosh 3y — 
cosh y) = tanh y, which agrees with the bracketed term in (7b). 


TIMING RECOVERY IN PAM 1651 


TIMING 





(+0 tanh (+) 







FRONT-END 
FILTER 





TIMING 
Fig. 1—Implementation of maximum likelihood strategy. 


tiplied and averaged as indicated by the sum in equation (8). This 
then is the error signal driving a voltage-controlled oscillator which 
in turn determines the new timing phase. 


Ill. IMPLEMENTATIONS APPROXIMATING THE OPTIMUM 


We now examine approximations of equations (7) and (8) leading to 
several simplified implementations of timing recovery systems. The 
first approach is to approximate tanh(x) in equation (8) by the 
limiter function sgn(z). This approximation yields 


tanh (eae za(7)) ~ sen z,(7) = sgn a, , (9) 


where d, is the nth decision, or the estimate of the nth data symbol. 
The approximation (9) is a good one at large signal-to-noise ratio and 
in this case dG, will equal a, most of the time. When this approximation 
is made, the detection circuit which computes 4, from z,(r) is sep- 
arated from the timing circuit. In the timing branch the received 
signal is first passed through the filter with impulse response h(—t) 
and the output is differentiated or equivalently passed through a high- 
pass filter and then sampled. These samples are multiplied by the sign 
of the respective decisions and summed to form an error signal. The 
multiplication of the respective derivative samples by the sign of the 
decisions is clearly necessary so as to convert all the error samples 
to the same polarity.* Figure 2 shows this simplified version of detec- 


* This is a decision directed estimation procedure. As the timing phase is 
acquired, the decisions become more reliable. 
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Fig. 2—An implementation approximating the ideal. 


tion am. timing recovery circuit. Deriving an error signal from the 
derivative of the received signal is very reasonable and a timing 
circuit based on this idea has been built and analyzed by Saltzberg.’ 

Another technique suggested from (7) is dubbed “early-late” timing 
recovery.” The approximations involved here are the following. First 
the derivative of A[V] is approximated by the difference 


>> {In cosh ((kd/No)en(7 -+ A)) — In cosh ((kd/No)e,(r — A))} 


k 
A<T.. (10) 


Next the nonlinear function In [cosh (x) ] is approximated by |z|. This 
again is a good approximation at large signal-to-noise ratio since for 
large |x|, cosh x —> e!*!, This implementation is shown in Fig. 3. Here 
two clock pulses separated by 2A sample the received wave after 
appropriate filtering. The respective samples are then full-wave recti- 
fied and substracted from one another. The error signal is formed by 
adding a number of successive differences. It appears that any even 
Nth-law device may be used in place of the In(cosh) nonlinearity in 
equation (7). Successful results for instance were obtained with a 
square-law device.?? 

A feature common to the above timing recovery systems is the gen- 
eration of an error signal from the received signal. The sampling 
instant is then adjusted so as to decrease the magnitude of the error, 
a new error is computed, and the estimation continues in this manner. 
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The fewer the number of iterations needed to obtain a reliable esti- 
mate, the better the system. Stochastic approximation is a technique 
which will enable us to study and control the dynamic behavior of 
such iterative estimation algorithms by viewing the synchronization 
problem as a regression problem. 


IV. THE APPLICATION OF STOCHASTIC APPROXIMATION TO SYMBOL 
SYNCHRONIZATION 


4.1 Stochastic Approximation 

We will briefly describe the salient features of stochastic approxima- 
tion, in particular the Robbins-Monro algorithm. Stochastic approxi- 
mation** is a technique employed to iteratively solve regression 
problems. The method is an extension of the Newton-Raphson tech- 
nique to a random environment, and is especially useful when the 
regression function is unknown. More precisely, suppose 2, 1s a 
sequence of independent observations of a stationary random process 
and it is desired to find the value of the (non-random) parameter + 
such that the regression equation, 


Elf(n 3 7)] = m(r) = m, , (11) 


is satisfied; where £ denotes expectation, f(-) is a given function, and 
m(:+) is called the regression function. As mentioned above, m(-) is 
typically unknown, and we desire an algorithm which uses the data to 
sequentially estimate the value of +, say +r*, which satisfies (11). 
Robbins and Monro have shown that if (11) has a unique solution 
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Fig. 3—Implementation of early-late timing recovery scheme. 
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then the estimate 7, , given by 


Tnt+1 = a Calf Ga5 To) — m,] n= 1, 2, sey 

will converge in mean-square and with probability one to r*, under 
some general conditions‘ on both the observations z, and on the 
positive scalar time-varying weighting sequence c, . A useful interpre- 
tation of the Robbins-Monro algorithm is that the current estimate is 
the sum of the previous estimate and a weighted correction term, 
where the average (with respect to the observations) correction is 
the error term m(tn) — m,. Thus the correction term will, on the 
average, give an increment in the correct direction, and the estimate 
will converge. Alternatively, if we regard the correction term as an 
approximation (in a stochastic sense) to an error term, we are re- 
minded of the deterministic error or gradient search type of algorithms. 
The weighting sequence c, is chosen to converge to zero fast enough 
so as to suppress the correction term as the estimate converges,* but 
slow enough so that large corrections are possible for many iterations 
(frequently c; is of the form 1/n). 

We now cast the synchronization problem as a regression problem, 
and then use the theory of stochastic approximation to develop a 
synchronization algorithm which has desirable dynamic properties. 
From (8) the optimum (maximum likelihood) timing parameter is 
the solution of 


0 
= (AG; 7)] = 0. 


If we make the identification 

0 

az [An } 7)] - fn } 7), (12a) 
and now ask for the value of 7 which satisfies 


m(r) = | 2 An ; | =0, (12b) 


then the desired [i.e., the solution of (12b)] timing parameter will 
be the solution of a regression equation. It is important to note that the 
solutions of (8) and (12b) will not, in general, be the same. However 
the solution of (8) is a random variable, which as the observation 


+ Note that even when 7, is close to 7*, the variance of the correction term 
can be quite large due to the randomness of the data. 
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time 7’, becomes large converges to r*; while the solution? of (12b) 
is in fact r*. Thus if we use a Robbins-Monro algorithm to iteratively 
solve (12b) we are indeed generating the maximum likelihood esti- 
mate. 


4.2 Binary Square-Wave Modulation 

Consider applying this method to analyze a timing recovery pro- 
cedure when h(t) in equation (1) is a rectangular pulse of 7 seconds 
duration and height A, where binary transmission is assumed for 
convenience. In this case, the observable function, equation (6), 
becomes 


i= i ” PONG =a Sars [ ~~ V(t) dt. (13) 


As mentioned earlier, we can use a square-law device to approximate 
the In cosh (-) nonlinearity for mathematical convenience. Thus the 
MLE is obtained by finding a 7 such that the derivative of >> 22(r) 


is zero. From (7a) and (13) we obtain 


& DAG) =2 DIV + OT +2) - VOT + dla). 4) 


At large signal-to-noise ratio, symbol transition information is ob- 
tained from 


0, On+1°An = 1 


d, = Vi(n+1)T +17) — V@T+7)~ (15) 


+1, On+1° An — —1 
The Robbins-Monro procedure for recursively estimating + can now 
be applied by using the regression function 

m(r) = E\d,en(r)}. (16) 


For convenience we center the pulse h(t) at ¢ = 0 such that 


h®) = i pe led re, 
0, elsewhere 
and calculate 


(n41) T+r (n4+1) T+7 


eee ij v(t) dt 


Tt+r 


d,2AT) = d,A a Qn / 


nT+rT 


+ For a high signal-to-noise ratio. 
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nT+T/2 (n4+1)T +r 
a dnd / dt + An41 / a} + y, 


T+r nT+T7/2 
= d,A{a,(T/2 — r) + aii(T/2 + 7)} +», , (17) 
where 
(n4+1)T+r 
— t) dt. 
‘ te v(t) 


In the absence of data transitions, (17) is independent of + while when 
transition occurs, 1.€., dy ~ An41, 


dien(t) = 2Ar +», , —-T/2s78 T/2. (18) 


Using (13) the recursive procedure for estimating the unknown 
timing parameter, 7, is now as follows: Pick an arbitrary sampling 
phase zo , |to| S 7/2, and compute the next sampling phase 7; from the 
relation (assuming that a data transition occurs) 


71 = T — 4(do%(70)) (19) 


To 7 4(2A 7 + Vo)+ 


The (n + 1)th sampling phase is then related to the nth by the 
recursion relation 


Tnt+1 = Tr — ae (diZn(Tn)), (20) 


where we have taken c, to be 1/n+1. For numerical evaluation pur- 
poses it is convenient to normalize (18) and work with the regression 
function? 


mt») = Elf@n , tm)] = Elta + tal, (21) 
where {z,} is a sequence of gaussian random variables with 
E{x,} = 0 (22) 
and 
i Nal oT 
Blt). = Gal = 3, 


where p = A?/2No1/T is the signal-to-noise ratio in a bit-rate band- 
width. 


+ We are assuming that a linear theory applies, ie., the sequence of {rn} 
rarely exceeds |7'/2|. In practice no values of 7, which exceeds |7'/2| will be 
accepted. Including these restrictions in the mathematical model will render 
equations (19), (20), and (21) nonlinear and thus mathematically intractable. 
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Upon substituting (21) into (20) a linear recursion relation is 
obtained with the well-known solution 
™ = —? ee a Se l7,| 3s 7/2. (28) 
n n + 1 n ae 1 = k n — 


By inspection the following pertinent parameters are computed 





To 


n+l 





Hn = L[7,] = 


—0, as now 


a ae 
8p (n + 1)’ 
In evaluating (24) we assumed that the sequence of random variables 
{x,} 1s independent. This is not strictly true. We see from (17) that 
the sequence of random variables {x,} for fixed 7 is indeed independent 
since each x, represents nonoverlapping integrals of the white-noise 
process v(t). However, as r, 1s changed according to equation (20) the 
noise integrals may overlap. To include this dependence in the analysis 
would render this seemingly simple problem untractable mathe- 
matically. Physically we feel, however, that this dependence is weak 
and therefore can be neglected. 

From (23) we see that 7, possesses a truncated gaussian prob- 
ability density 


P(r.) = pi (7, — T/2) + pe O(t, + T/2) + G(r.) | 7. | S$ T/2 


o, = E{r, — E*[7,]} = var 7, = —0, as no. (24) 


(25) 
=0 la > 2/2 


where 


; il 1 
G(7,) = Woz é exp (sh (tm, — unr) 


and 
-7/2 
a = / G(r,) arn 


in / Cava. 
T/2 


Using this probability density we can compute the system error rate. 
Dispensing with tedious computational details, and focusing atten- 
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tion on essentials, we find that the conditional error rate (conditioned 
on the unknown parameter 7,) for this simple system is asymptot- 
ically (large signal-to-noise ratio) 


Pojrven—(AU—2lall jsp, 06 
where 
c= NT. 
When 7, = 0, we have ideal performance, as we should. When 7, = 


+T/2, we have disaster. To obtain the actual error rate we must 
average (26) over the permissable values of 7, . This calculation yields 


T/2 


P, = E{P.(7,)} ~ pi + po + oo P7,)G(r,) dt, (27) 


The evaluation of (27) is straightforward. In terms of the normalized 
random variable a = +,/7’, we express (26) in the form 


Play roe Oe lal < 1/2. (28) 


In terms of the same normalized variables and the explicit values of 
bn and a, [equation (24) ] we write 


Glo) ~ exp | — an( = Ay" (29) 


which is valid when n is large. In ye down (29) we set tr) = T/2 
(a worst initial guess). 

Asymptotically, p; and p, behave as e~”? and, as we shall see shortly, 
can be neglected compared with the last term in (27). To conclude 
the error rate calculation we evaluate 


T/2 


Po shFn)G (ty) dt, = £,(p) ~ fie —p(1-2]a])?—p4n(a—1/2n)? ae 


=f 


i 
2 


@ PB) day [ @ PB) dey. (30) 
0 


where 


Gy SA 4 sn( 2 ral 


and 


hie = doa + an( = ae 
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Using a saddle-point technique to obtain an asymptotic approximation 
for the integrals, we find that 


P, (p) eee ge (n) ot eens ws 


where 
1 
M,(n) ~1+ a 
and (31) 
3 
M,(n) ands 1 = n 


Combining the above asymptotic results with p,; and ps we obtain 


finally 
P, ~ exp {- A(t — 3)\ (32) 


for n and p large. All the other terms have exponents larger than (82) 
and therefore can be neglected. For example when n = 30, the degrada- 
tion from ideal (n>) is only 0.5 dB approximately. 

What this example shows is that for square-wave modulations in 
the presence of additive white gaussian noise, bit timing can reliably 
be derived in approximately 30-bit intervals. 


4.3 Synchronization of Bandlimited PAM 

We now consider a timing recovery algorithm for a bandlimited 
PAM signal. As in the previous section the synchronization problem 
will be cast as a regression problem. Our received signal is given by 
equation (1) 


Vig = S a,h(t — mT — 7*) +r), (33) 


and as before the objective of the synchronizer is to accurately and 
rapidly estimate 7*. In order to extract information about r* we low- 
pass filter, differentiate, and sample the received signal. Hence the 
error signal is similar to that shown in Fig. 2, with the matched 
filter replaced by a low-pass filter. Thus the receiver does not need 
knowledge of the pulse h(t). If we denote the derivative of h(-) by 
g(-), then the differentiated and sampled received signal is given by 


WkT + 7) = Yoangl(k — mT + 7 — r*] + v(kT +7) 


De OnGrem(t ~ T*) +e , (34) 
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where 7 is an arbitrary sampling time such that |r| < 7/2, gx» de- 
notes g((k — m)T), and v; are samples‘ of the differentiated noise 
process v(t). As before we let @, denote the decision made at time 
kT + 7. Assuming that the error rate is low enough so that with high 
probability a, = a; , we then have that 


4, V (kT + T) G.O.go(r = r*) + Gy. », OnGk—m(T = T*) + Vy, 


ag(r — 7*) + 4d, p® OmnJr-m(t — T*) +», (35) 


where we have noted that g,(r — 7*) = g(r — 7*). If we further 
assume that 4d; is uncorrelated with a;,* for 7 ~ k, then averaging 
(35) gives 


mr) © Ela,V'(kT + 7)] = a&g(r — 7°), (36) 
where 
ae 


Now for the typical impulse response A(t) and its derivative g(t), 
shown in Figs. 4 and 5, respectively, it is true that the (regression) 
equation 


Ga 0,, eae 172 (37) 


has the unique solution 


 aeela aie (388) 
Since the synchronization problem has been modeled as a regression 
problem, we again use a Robbins-Monro algorithm to sequentially 
estimate r*. Denoting the kth estimate by 7, we have the modified 
Robbins-Monro algorithm 


. - \" + o[d,V'(kT + 7,)], | tT, + eld.V (kT + 7:)] | <T/2 
k+1 7 


Tk 5 otherwise. 
(39) 


A feedback implementation of the above algorithm is shown in Fig. 6, 
with D denoting a delay. It is again noted that the algorithm con- 


+The dependence of the noise sample on the sampling offset 7 is not shown, 
since it 1s assumed that the noise is stationary. 

+ As it will be if the a:’s are independent and the receiver is supplied with 
an ideal reference. 
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hct) 





Fig. 4—A typical impulse response h(t). 


strains the estimate to a region of width 7. This is consistent with 
the observation that any actual sampling instant will always be 
within 7/2 seconds of the desired instant 7*, i.e., we may “slip” T 
seconds but this is immaterial as far as estimating +* is concerned. It 
is by no means clear, a priori, that the above algorithm will converge 
rapidly or will converge at all. In fact the rest of this paper will con- 
sider the conditions which must be satisfied for the above algorithm 
to converge and the resulting rate of convergence. 


Vv. ANALYSIS OF THE SYNCHRONIZATION ALGORITHM 


5.1 The Error Equation 


In order to evaluate the proposed synchronization algorithm we 
will derive a difference equation for the mean-square estimation error 
é;, Where 


€&y = T — T*, (40) 


gct) 


Fig. 5—The derivative of h(t). 
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TIME-VARYING 
GAIN CK 


Fig. 6—A realization of the synchronization algorithm. 


and the overbar denotes expectation. In order to do this we see that 
from (39), and neglecting for the moment the constraining portion of 
the algorithm, we have 


Craa = Tear — TH = Te — T* + 6,(4,V (AT + 7,)] 
= t= a Se ex g (rT. _ 7*) He Oy > OmGk—m(Th — r*) +] 
me 


ex + c.lg(ex) + a pe OnGJu—m(Cx) + Vel. (41) 


We note that g(-) is such that, on the average, the error is decreased 
at each iteration, and once the estimation error is small* we need only 
keep first-order terms in a Taylor Series expansion of gpm(e,) about 
(k — m)T, 1e., 


Jx—m(€x) ~ Jr—m = Ji. —mek ’ (42) 


where g{_,, denotes the derivative of g(-) evaluated at (k — m)T. 
Combining (41) and (42) yields the (approximate) first-order stochastic 
difference equation for the evolution of the error, 


Ce4+1 = [1 a i J iC + C4; be OmGk-m lex oe cC.dkl >, OG ise vi) (43) 
m#zk 


mA 


Before studying the behavior of (43) we introduce the following 


+ We use the mean-square estimation error as a measure of performance. This 
is because the estimate is a nonlinear one, and thus the probability of error can- 
not be computed. 

* Under this assumption we can certainly neglect the possibility that re+1 = tx. 
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notation: 
go = —a (44a) 
Be = ex(g3 + 4, Do Ongi-m) (44b) 
me#zk 
Y¥, =1+ 6, (44c) 
Q: = Gi, oo AnGJk—m + VE ) (44d) 
m#k 


and using the above we rewrite (43) as 


Ceti = Vrex + c.Q . (45) 


Thus the error obeys a stochastic difference equation where the gain 
(y;) and the driving term (Q;,) are correlated. It is important to note 
that for the system described by (45) the probability density of the 
present error e; does not depend solely on a finite number of past data 
symbols, a; , but depends on all past and future values. This renders 
impossible an exact analysis of the mean-square error. However, if we 
assume that both y; and Q;, are independent sequences, then e;, depends 
solely on past y;, and Q;,, and we can obtain a bound on the mean- 
square error.‘ Squaring and averaging both sides of (45) gives 


E [er +1] = Elven] + 2¢,Elye.Q.] + c.E(Q:). (46) 


We now proceed to bound each of the terms on the right-hand side of 
(46). If we assume that the “eye” of the twice-differentiated impulse 
response is open, 1L.e., 


a> Dl gn, (47) 

then ™ 
Me = 1— ele — de Dy angi-n) S1— ere — Dil gm |) (48a) 
= 1—-¢,6, (48b) 


where 8 denotes a — >i neo |gZ|. Using the above assumption, and the 
boundedness of the error, we have that 


|Ely.Qxex]| = |ElelEly.Qell < T/2 |ELy.Q:]|, (49) 


* Despite much effort we have been unable to proceed without this assumption, 
but since the results which follow are intuitively satisfying and provide insight 
into this difficult problem they have been included in the paper. 
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and due to the independence of the data bits 
E{y.Q:] 7 EY — a + Ody Sy Onin) Onl Dy iGe-i + vi) | 
m#k ix 


= ¢ >, 949m = ¢2/TG, (50) 
m0 
where’ G denotes 7/2 domo 9m » Finally we have 
E(Qi) = 0° + Dign =o +P, (51) 
m0 
where P denotes dono 92 . Letting 
A; = Efe), (52) 
and combining (46)—(52) we have the iterative bound 
Arai S (l = Bex)” Ay + c.M (53) 


on the mean-square error, where M is the sum of G and oa? + P. 
Although several assumptions have been made in obtaining (53) it is 
believed that the effect of the salient quantities upon the synchroniza- 
tion algorithm have been preserved. We now proceed to find the gain 
sequence which minimizes the bound of (538). 


5.2 The Optimum Gain Sequence 


We now find the sequence of gains, ct, which minimize the right- 
hand side (RHS) of (53) for fixed A, . Since we minimize a bound on 
the mean-square error at every iteration, this is a min-max procedure. 
We first find the optimum gain sequence in terms of A, , and then by 
simultaneously iterating this equation and the bound of (53) we show 
that c* is proportional to 1/k for large k. We begin by setting to zero 
the derivative of the RHS of (53) with respect to c, , i.e., 


—B(1 a Bc.) Ax + Mc = 0 


or 
GG = ae (54) 
sine (4) in. 88) wehave 
ton S (I~ 5p Si,) & + MGB) 
< ao (55) 


+ It should be noted that if h(t) is an even function of time (with respect to 
the origin), then g(t) and g’(t) will be respectively odd and even time functions 
and G will be zero. 
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or 
hy < ct (56) 
(1 7 Bek, 1) aa 
Now if 
(1 — fc#,,) 2 0 (57) 
then we have the relation 
chi. = _ oF (58) 
k+1 = 1 + Bc* ) 
which can be iterated to give? 
isin ait 
crs 1+ betk (59a) 
es BA, 
~ M+ BA, + 1)’ 50) 
where 
BA; 
— a 
= + BA, ’ 


and A, is the initial error variance. Henceforth we will interpret the 
sequence c*, specified by (59) and (60) with the inequality replaced 
by an equality, as the optimum gain sequence. Combining (55), (59) 
and (60) we sce that the mean-square error is bounded by 


MA, 


Ss ge ee 
A; = M 4. BA, k (61a) 
which for large k becomes 
M1 
A‘ s ek (61b) 


Thus we see that asymptotically the minimized mean-square error 
is bounded by a term which decays as 1/k, and is inversely proportional 
to signal-to-noise type ratio (6’/M). 

The optimum gain, as given by (59b), depends upon the parameters 
A, , M, and £. Since these quantities are generally unknown it is tempting 
to replace c*~ by its asymptotic (large k) value 1/6(k + 1). Caution 
must be exercised in making this approximation; since M > 6°A, 


+ Note that 8c.* S Bc.*/(1 + Be.*k) & 1, thus satisfying (57). 
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implies that the optimum gain sequence is essentially constant for 
many iterations, substitution of a decaying sequence could lead to an 
unreliable estimate (we will consider this point in Section 5.4). How- 
ever if’ B’A, >> M, then ct & 1/8(k + 1) and we have only one unknown 
parameter. A possibility is to replace 8 by an estimate—techniques 
of this sort are called adaptive estimation procedures. We now sketch 
a particular adaptive scheme. 


5.3 An Adaptive Synchronization Algorithm 
We now give a method for recursively estimating 8, which can then 
be incorporated in an adaptive synchronization scheme. Since 


B=a- Ji \gn\; 


we desire a function of the received data which has B as its average 
value. We note that from (34) we have 


E[G,.V" (kT + 7)] = g(r — 1*) & - (62a) 


(where the approximation is for small r — 7*), and 


EBlaV"'(kT + 7)) = gilt — 7*) & gh - (62b) 


We can then estimate 6 by using a recursive stochastic approxi- 
mation algorithm of the type discussed in Section 4.1. Such a scheme 
would twice differentiate the incoming data and then multiply the 
data sample by as many of the previous decisions as there are signif- 
icant nonzero samples in the impulse response. Since even an approxi- 
mate analysis of the above algorithm is hopelessly complex, we will 
consider the effect of using a gain of the form c/k, where c is a con- 
stant to be chosen. 


5.4 A Suboptimum Gain 

We consider the mean-square error, as given by (53), with c, = c/k. 
This gain is chosen since the optimum gain is asymptotically of this 
form. Care must be taken in choosing c, since the mean-square error 
will be shown to be a sensitive function of this parameter. Iterating 
(53) gives 


Aver S [TG — 6e)’A,+ DF I] — bey)’. (63) 


t7=0 Gsit 


+A condition one would expect to be satisfied in practice. 
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The inequality 


lsgee? (64) 


gives 
TI @ - 66)? < exp (—2 > Be,) : (65) 


and noting that 


eae eek egy, 2 (4) 
Di eiape | de = een(5 





results in 


01 (1 — 6c,)’ S$ (aaa (66) 


j=itl 
We can see that the transient behavior of the mean-square error, 
which is specified by the first term on the RHS of (63), will be of the 
form (1/k)?8¢. The other component of the mean-square error will 
be (approximately) bounded by 


k * 2Bc C 
> ( = 1) MS >» (i ae Ly ees 1) 





ee 
are ee aa 
~ Ope and +H’ ve 
which results in 
Are S og meee oem RT (67b) 


as a bound on the mean-square error. If 28c > 1, then for large k the 
above bound becomes 


Mec’ 
See (28e — 1)(1 + f) 
and the mean-square error will converge at the optimum rate (1/k). It 
is seen that care must be taken in selecting c, since for c 2 1/28 
(1.e., for 26c > 1) the quantity Mc?/28c — 1 has a minimum! at c = 


(67c) 


+ With c = 1/8, Ax S M/p? 1/k which is the optimum asymptotic rate of 
convergence. 
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1/8, and is infinite at both c = 1/28 and c = o. Thus a very small 
step size (c < 1/28) will result in an mse which converges at a less 
than optimum rate, while large step sizes (c > 1/28) will result in a 
mean-square error which, while converging at the optimum rate, may 
be quite large for many iterations. The sensitivity of the above bound 
with respect to “c’? may make the use of an adaptive procedure 
(which estimates 8) advisable. 


5.5 An Example 
Consider the (minimum bandwidth) pulse 


sin rWt 
h(t) = A Wi (68) 


where W = 1/T. It is easy to show that 


B= 3A@Wy 


M 


2 
+5 Aw’, 


thus from (61b) the percentage minimized mean-square error is 


bounded by 
o (xy 
AM i+ (5) 7W) 


T < Pek ~ Lh 
For a 30 dB signal-to-noise (A/c) ratio, and with W = 3000 Hz, we 
see that A;/T? is less than 0.01 for k 2 10. In other words, after 10 


symbols have been received, the above synchronization algorithm re- 
duces the mean-square error to less than 1/100 of a symbol interval. 





(69) 
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Optimum Equalization and the Effect of 
Timing and Carrier Phase on Synchronous 
Data Systems 
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(Manuscript received December 16, 1970) 


The minimum mean-square error (M.M.S.E.) at the receiver output 
generally depends upon the sampling instant and demodulating carrier 
phase for synchronous data systems. In this study, it 1s shown that 
for certain single-sideband data systems with no excess bandwidth 
(e.g., class IV and class V partial-response systems), the M.M.S.E. 
is completely independent of the sampling instant and demodulating 
carrier phase if the receiver contains an infinitely long transversal 
filter equalizer. Practically speaking, computer calculations indicate 
that for a class IV system operating in the presence of typical received 
signal-to-noise ratios, a 19-tap equalizer is sufficient to make the 
M.M.S.E. relatively insensitive to the sampling instant and demodulat- 
ing carrier phase. Thus, for such data systems, a significant reduction 
im the receiver complexity and possibly in the start-up time may be 
obtained, because no time is spent acquiring timing and carrier phase. 

The optimum infinte-length equalizer for synchronous data sys- 
tems with a fixed channel rs also calculated for two different con- 
ditions. The conditions are: (1) the minimization of the output noise 
plus mean-square intersymbol interference and (vi) the minimization 
of the output nose subject to the constraint that the equalizer forces 
the intersymbol interference to zero. Explicit expressions for the 
optumum equalizer and the M.M.S.E. are obtained. Satisfying condi- 
tion (v) results in the lower value of M.M.S.E.; however, the M.MS.E.s 
for these two criteria are almost equivalent for either large signal- 
to-norse ratios or small slope of the amplitude-frequency characteris- 
tics of the channel. 


I. INTRODUCTION 


In synchronous data systems, the transmission rates are frequently 
limited by the intersymbol interference which is caused by the ampli- 
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tude and phase distortion in the transmission channel. In order to 
reduce the effect of the intersymbol interference, it is necessary to 
equalize the channel before the data can be transmitted. 

Several automatic equalization schemes using transversal filters 
have been devised for such data systems.1* Chang® has investigated 
the effect of the sampling instant and carrier phase on the minimum 
mean-square intersymbol interference for a noiseless system with a 
finite-length transversal equalizer. In principle, we can make the 
mean-square intersymbol interference arbitrarily small by using an 
infinitely long transversal filter, provided that the tap-gain settings 
can be made arbitrarily accurate. However, the equalizer which forces 
the intersymbol interference to zero may not be the most desirable one 
when noise is present. 

We have found, in this study, the optimum infinite-length mean- 
square equalizer for such synchronous data systems with a fixed 
channel. Two different cases are considered corresponding to the fol- 
lowing optimality criteria: (7) the minimization of the output noise 
plus mean-square intersymbol interference, and (2) the minimization 
of the output noise power subject to the constraint that the equalizer 
forces the intersymbol interference to zero. 

Explicit expressions for the optimum equalizer and the M.M.S.E. 
are obtained. We also have found that for certain types of data sys- 
tems (S8.S.B. class IV or class V_ partial-response systems) the 
M.M.S.E. does not depend upon sampling instant and demodulating 
carrier phase. Thus, for such data systems, it may be possible to 
reduce significantly the receiver complexity and the start-up time. 

A computer program has been written for a class IV system which 
is equipped with a finite-length mean-square equalizer. The number 
of taps needed to achieve near optimum performance in practical 
situations can then be determined. Throughout, additive Gaussian 
noise and independence of information digits are assumed. 


II. GENERAL CONSIDERATIONS 


The optimality criteria will be formulated in this section. A sim- 
plified block diagram of a general digital data system is shown in 
Fig. 1. We assume that every 7 seconds, an impulse of amplitude 
Gy, (dn = {2M + 1, --: 1, —1, «+: —(2M + 1)}), is transmitted to 
the input of the system. The a, are assumed to be identically distrib- 
uted independent random variables. 

In the absence of channel noise, for a sequence of input impulses 
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Fig. 1—Digital data system, simplified block diagram. 


ie] 


> 4,4(t — nT), (1) 


n=—0 
the corresponding sequence at the receiver equalizer output is 


ice) 


2; any(t _ nT, 6), (2) 


where y(t, 6) is the system impulse response with demodulating 
carrier phase 6.* 

With noise, the output at the sampling instant ¢) with a demodulat- 
ing carrier phase @ is 


V = agy(to , 00) + > any(to — nT, Oo) + nto) (3a) 
or 
V = AoYo( Io) ae > AnYn( Oo) =f No ? (3b) 


where the terms y,, (4), n 4 0, represent intersymbol interference and 
yo(@) is the output value at the main sample point. 

One useful measure of the performance of such a data system is 
mean-square error (M.S.E.). In this study, we define the normalized 
M.S.E. at the sampling instant t) with a demodulating carrier phase 
Ay to be 


[M.S.E.J.,.0. = E{[V — AyYo(9o)] }/E { acyo( 9) } 
= [E {no} + Elaj] 2 Yn( Oo) |/E (acyo( 4) } ; (4) 


where |X] means expectation of random variable X. 
For binary systems, E[a‘] is equal to 1. The variance of the noise at 


*@ = 0° corresponds to the phase of the frequency component of the received 
spectrum at the carrier frequency. 
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the equalizer output (see the Appendix) is 


2 


ob = Eq’) = Sf beg @)-| EW) F de, (5) 





where o* 7' is the power spectral density of the input white noise, » eq (w) 
is the square of the equivalent baseband receiver filter characteristic 
and I/(w) is the transfer function of the equalizer. 

Now we wish to design two optimum equalizers for two different 
conditions. In the first the [M.S.E.],,., is minimized subject to the 
constraint that yo(@) is a constant, 


wr/T 
C, = y0(0) = Re= [  Y eq@, 6)B@)e"*"* da, (6) 
0 


where Y eq (w, 6) (see the Appendix) is the equivalent baseband 
system transfer function for sampling instant t and demodulating 
carrier phase @ . In the second case the optimum equalizer is found by 
minimizing the variance of the output noise subject to the constraint 
equations, (6) and 


u/T 
(4G = Re= i ¥ eq (w, &)E we" do, n#0. (7) 
0 


Ill. MINIMIZATION OF NOISE PLUS INTERSYMBOL INTERFERENCE 


3.1 A General Binary Data System 
The details of the minimization of the [M.8.E.],,.5, given by equa- 
tion (4) subject to the constraint equation (6) are given in the Appendix. 
For a binary data system, the optimum equalizer, /y(w), for sampling 
instant t) and demodulating carrier phase 4, is 


{¥ eq @, O)e’°'*}* 
Wola. = TeAy eq @) + | Y eq @, 0) 
: C1 
T — | Y eq (, Oo) |” 
o owe (a) + | ¥ eq (, &) 
where {X}* means complex conjugate of X. It follows that the 


M.M.S.E. for the corresponding sampling instant and demodulating 
carrier phase is 


[IM.MS.E.]i., 








(8) 











z u ee (9) 
Sa | ¥ eq @, 4) | 
ow eq w) + | Y eq @, 6) | 
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| Y eq (w, ) |” is, in general, a function of f) and 4 . It follows that 
[M.M.S.E.],,,2, depends generally upon é and 6). Hence, there exists 
an optimum sampling instant f and demodulating carrier phase 4, , 
such that 
[M.M.S.E.|;,.5, = min [M.MS.E.],,.6, . (10) 
all to,8o 
However, there exist cases where |Y eq (, 6))|? is not a function 
of t and 6; for example, the |Y eq (o, 6) |? of a single-sideband 
system with no excess bandwidth is independent of the sampling 
instant, ¢) , and the demodulating carrier phase, 6) , (see the Appendix.) 
The SSB class IV partial-response system represents another example. 
Therefore, if an infinite equalizer is available, no loss in performance 
occurs when arbitrary t) and 4 are used. 


3.2 SSB Class IV Partial-Response System 

In this section we will show that the M.M.S.E. for a class IV 
partial-response system is independent of sampling instant and 
demodulating carrier phase. The M.M.S.E. will be computed for 
certain typical telephone channels. Such a data system has been fully 
described in Reference 7. 

The transfer functions of the transmitter and receiver filters are 


S, _ w) = R@, ay «) 
OT sin |e Pe eee 0< |w | <5. 


(11) 
0 otherwise. 
The M.S8.E. for the partial-response system is defined to be 
[oo + (y:(4) + Yy—1(8o))” an ps Yn o) | 
MS.E.J|c¢.:..4. = Et 12 
pene [(uiC@.) — y-al0))/21° oi 


The constraint equations are 


Aa a ~j0o E07, \ fa (totP) 
C = y:(%) = Re oe —j(sin wT)T(w, — we’ °E@)e dw 
0 


(13) 
and 


C — 2 = y-1(4o) 


x/T . 
= Re = [ -igin ote. — we BQ do (14) 
0 
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where Re [X] means real part of XY, T'(o) is the transfer function of 
the channel, and C is a constant. 

For a given constant C, sampling instant t,, and demodulating 
carrier phase 6), the optimum equalizer is 


[C(A, — A,) + 24, 


Ae [—j2T(sin T | w |)-T, — w) 


°€ 


+ 


~ido piv (tot D1 x 


C(A, —— A.) vax 2A, 
[Eo(w) ]eo.00.¢ = At 7 A; 





[—j2T (in T | w |)-T@. — a) 





| ag iigt ASE) le 


"oT sin T [wo | + 47’ sin’? T | w |-| T@, — ) |? 


0 otherwise, 

(15) 
where [X]* means complex conjugate of X, and, A; and Az are given 
by the equation (41a) and (41b) respectively. 

It follows that, the corresponding M.M.S.E. is 


[M.M.8.E.Jo.1..0 
1 r/T 


ae [o72T sin wT + 47” sin’ wT'-| Tw, — w) |] 


-| BoW)]c.t0.0. | dw + 2C* — 4C. (16) 


Since | [4o(w)]e..,.0, |” is independent of t and 6, the M.M.S.E. 
does not depend upon ¢) and @ . Equation (16) can be further minimized 
over all possible values of C’. The optimum solution, C = 1, results in the 
minimum of the M.S.E. for the class IV partial-response system equipped 
with a mean-square equalizer. 





2 — —— 
A, ss As 
= min[M.MS.EF.]¢ (17) 


all ¢ 


[M.M.S.E.]e-: 2 





We may thus conclude that for the SSB class IV partial-response sys- 
tem, the M.M.S.E. does not depend upon ty) and 6) since the constants 
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A; do not depend on ty and 6). We may therefore arbitrarily choose 
the sampling instant and the demodulating carrier phase with no loss 
-of optimality as long as the equalizer transversal filter length is 
infinite. 

As an example, we assume that the equivalent baseband channel 
characteristics are linear in amplitude-frequency response and 
quadratic in delay-frequency response as shown in Fig. 2. The 
delay at the Nyquist frequency of +/T rad/s is taken to be BT sec- 
onds. The transfer function of the channel is 


IIA 


Tw, —w) = (1 — ale) stenrreraey 0s |o| (18) 


S 
T 

The M.M.8.E.s computed by equation (17) for various a, 8,,7', and 
o, are given in Table I. For these calculations the transmitted signal 
power is fixed at 4/7 watts. 


3.3 Computer Results for Class IV Partial-Response System with a 
Finite Length Equalizer 


A particular case has been calculated by the computer for a class IV 
partial-response system with a finite length equalizer. The following 
assumptions are made: 


DELAY 
IN SECONDS DELAY 
AMPLITUDE 





Bml --—-—----———~—-———-~—-—— 





AMPLITUDE ==} 1. 


Fig. 2—Equivalent baseband channel characteristics. 
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TaBLE I—M.M.8.E. ComputTep By Equation (17) 


M.M.S.E. (1072) M.M.S.E. (107?) 
o;? (10-2 watts/Hz) | a =0.1,8.7 =1,7T =1 | ea =09,86,7 =1,T =1 
4 5.621 22.57 
2 2.818 12.14 
0.4 0.564 2.74 
0.2 0.282 1.41 


(t) The transfer functions of the channel is 


Tw, — 0) = (1 — 0.1 Le D)p-tterenraen OS |w|S7/T. (19) 
r/T 
(it) The signal-to-noise ratio at the receiver input is assumed to be 
21 dB. 
(zit) The delay at the Nyquist frequency is taken to be 1 second and 
the baud is assumed to be 1 symbol/second. 


Forty distinct combinations of sampling instants (0, 0.2, 0.4, 0.6, 
0.8) and demodulating carrier phase (90°, 60°, 30°, 15°, 0°, —15°, 
—30°, —60°) have been tried with a 19-tap mean-square equalizer. 
The results and the minimum of the mean-square error are shown in 
Figures 3 through 10. It can be seen that the M.S.E. for most combina- 
tions is near the minimum achieved by the infinite equalizer. Practically 
speaking, in this example the system performance is acceptable (with 
error-rate upper bounded® by 107°) with a 19-tap mean-square equalizer 
for all 40 distinct combinations. 


IV. MINIMIZATION OF NOISE SUBJECT TO THE CONSTRAINT THAT THE 
EQUALIZER FORCES THE INTERSYMBOL INTERFERENCE TO ZERO 


4.1 A General Binary Data System 
The minimization of the output noise power [see equation (5) |, 


2ry7 u/T 
= [veg &)-| EW) I de, 


subject to the constraint equations (6) and (7) can be solved through 
a straightforward application of the method of Lagrangian multipliers. 

The expression of the optimum equalizer for sampling instant to 
and demodulating carrier phase 6) is found to be | 


C,Ru@) 


Vea G;.0)) 20) 


[Ho ]e., 0. = 
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0.025 
MMSE 

ad (19-TAP MEAN-SQUARE 
a EQUALIZER) 
” 
= 0.015 

MMSE (INFINITE EQUALIZER) 
0.010 





0 0.2T 0.4T 0.6T 0.8T 
SAMPLING TIME 


Fig. 3—M.M.S.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 — 0.1 [|w]/(7/T)]} 
exp — j(w3Bm T3/3m?); demodulating carrier phase, 9 = —60°, (S/N) input = 
21 dB. 


where Rg(w) is the desired received baseband equivalent signal 





spectrum. 
It follows that the minimum output noise power is 
2 a/T 2 
. 2 _ oj | 2 Ri(w) 
[min oo],,.6. ae y eq &)C; Peo.) dw. (21) 








In general, Y eq (w, 6) is a function of the sampling instant, f, 
and the demodulating carrier phase, 4 ; therefore the minimum output 
noise power depends upon fy and 4%. 


4.2 Class IV Partial-Response System 
It can be seen from equation (21) that the M.M.S.E. generally 
depends upon f) and 6). However, for the SSB class IV partial- 


MMSE 
(19-TAP MEAN-SQUARE 
EQUALIZER) \. 


~~ MMSE (INFINITE EQUALIZER) 





fe) 0.2T 0.4T 0.6T 0.8T 
SAMPLING TIME 


Fig. 4—M.M.S.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 — 0.1 [[w|/(a/T)]} 
SE j(w3Bm T3/3r2); demodulating carrier phase, @ = —30°, (S/N)input = 
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MMSE 
(19-TAP MEAN- SQUARE 
EQUALIZER) 


Ue 


K 
\MMSE (INFINITE EQUALIZER) 





0 0.2T 0.4T 0.6T 0.8T 
SAMPLING TIME 


Fig. 5—M.M.S.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 — 01 [lwl/(r/T)]} 
ous — j(w3Bmn T3/3r2); demodulating carrier phase, ¢= —15°, (S/N) input = 
21 dB. 


response system, 


| E(w) (22) 


Y eq @, 4%) 





_ — 
Tw, — ) 





Therefore, the minimum output noise power is Independent of t) and 
A . 

Table II shows the values of minimum output noise power computed 
by equations (21) and (22) for various a, 6,,7', and o; under the same 
assumptions made in Section IIT. For these calculations the transmitted 
signal power is fixed at 4/7 watts. 

Table III gives the difference in M.M.S.E. computed by equations 
(17) and (21). 


MMSE 
(19-TAP MEAN- SQUARE 


Se ee ee 


MMSE (INFINITE EQUALIZER) 





O 0.2T 0.4T 0.6T 0.8T 
SAMPLING TIME 


Fig. 6—M.M.S.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 - — 0.1 [lw|/(r/T)]} 
Se j(w3Bm T3/3r2); demodulating carrier phase, 6 = 0°, (S/N) input = 
21 3 
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MMSE 
(19-TAP MEAN-SQUARE 
EQUALIZER) 


MMSE (INFINITE EQUALIZER) 





ce) 0.2T 0.4T 0.6T 0.8T 
SAMPLING TIME 


Fig. 7—-M.M.S.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 — 0.1 [|w]/(#/7)]} 
exp — j(w38m T3/3n2); demodulating carrier phase, 96 = 15°, (S/N)input = 
21 dB. 


The results show that the M.M.S.E.s computed by equations (17) 
and (21) are almost the same if either the signal-to-noise ratio is 
large or the slope of the amplitude-frequency characteristic of the 
channel is small (e.g., in this case the slope is 0.1). Notice that either 
decreasing the signal-to-noise ratio or increasing the slope of the 
amplitude-frequency characteristic of the channel increases the 
disparity of the M.M.S.E.s obtained by (17) and (21). As an example, 
if 


a, = 0.02 watts/Hz 


and 
Tw, — w) = (1-09 | w | —i(w3/34r?) (23 
c wo) = . = e ’ ) 
0.025 
MMSE 
(19-TAP MEAN-SQUARE 
0.020 EQUALIZER) 
uJ 
3 
S 0.015 
MMSE (INFINITE EQUALIZER) 
0.010 





0 0.2T 0.4T 0.6T 0.8T 
SAMPLING TIME 


Fig. 8-M.M.S.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 — 0.1 [{w{/(7/7)]} 
exp — j(w38m 73/32); demodulating carrier phase, 6 = 30°, (S/N)inpue = 
21 dB. 
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0.035 
0.030 
MMSE 
(19-TAP MEAN-SQUARE 
0.025 EQUALIZER) 
w 0.020 
Vv) 
= 
3 
0.015 
MMSE (INFINITE EQUALIZER) 
0.010 





0 0.2T 0.4T ~~ 0.6T 0.8T 
SAMPLING TIME 


Fig. 9-M.MS.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 ~— 0.1 Llol/(r/T) IF 
exp — j(w38m T3/322); demodulating carrier phase, 6 = 60°, (S/N) input = 
21 dB. 


then the M.MS.E.s obtained by equations (17) and (21) are 0.1214 
and 0.1483 respectively. It can be seen that the M.M.S.E. is 16 percent 
less if the equation minimizing the mean-square intersymbol inter- 
ference plus noise is used. 


V. SUMMARY AND CONCLUSION 


The optimum equalizer for a synchronous data system with a fixed 
channel is derived in this study. Two different optimality criteria are 
assumed: (7) the minimization of the output noise plus mean-square 


MMSE 
(19-TAP MEAN-SQUARE 
EQUALIZER) 


MMSE (INFINITE EQUALIZER) 





10) 0.2T 0.4T 0.6T 0.8T 
SAMPLING TIME 


Fig. 10—M.MS.E. versus sampling time; SSB class IV partial-response sys- 
tem; baseband equivalent channel transfer function, {1 — 0.1 [lwl/(r/T)}} 
oe j(w?Bm T3/3r2); demodulating carrier phase, 6 = 90°, (S/N)inpue = 
21 dB 
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TABLE II—Minimum Ovureut Norse PowErR COMPUTED 
BY Equation (21) 





oo” (107?) oo (10~?) 
oi? (10-2 watts/Hz) | «@ = 0.1,0.F =1,T =1 |] «a = 0.9,8,7T =1,T = 1 
4 5.652 29 .66 
2 2.826 14.83 
1 1.413 7.42 
0.4 0.565 2.97 
0.2 0.282 1.49 











intersymbol interference and (iz) the minimization of the output noise 
subject to the constraint that the equalizer forces the intersymbol 
interference to zero. 

Explicit expressions for the optimum equalizer and the correspond- 
ing M.M.S.E. are obtained. It is known that the M.M.S.E. at the 
equalizer output generally depends upon the sampling instant and the 
demodulating carrier phase. However, we have shown in this study 
that there exist cases where the M.M.S.E. is independent of the 
sampling instant and the demodulating carrier phase. The SSB class IV 
partial-response system represents a good example. Thus for such data 
systems, we may use arbitrary timing and carrier phase, thereby sig- 
nificantly reducing the receiver complexity and possibly the start-up 
time as well. The results calculated by the computer for an SSB 
class IV partial-response system equipped with a 19-tap mean-square 
equalizer show that the system error-rate for all 40 distinct combi- 
nations of sampling instants and carrier phases is less than 10°. The 
system is operated over a channel with linearly distorted amplitude- 
frequency characteristic and parabolically distorted delay-frequency 
characteristic (see Section III) which is worse than a worst-case C-2 
line. The signal-to-noise ratio at the receiver input is assumed to 
be 21 dB. The results also show that with either small slope of the 


Taste [J]—DIrrerence in M.M.S.E. 











oo? — M.M.S.E. (1072) oo? — M.M.S.E. (1072) 
oi? (10-2 watts/Hz) | @ =0.1, 6,7 =1,T =1 | « =0.9,8nT7 =1,T =1 
4 0.031 7.08 
2 0.08 2.69 
1 0.002 0.97 
0.4 2G) 0.23 
0.2 =0 0.08 
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amplitude-frequeney characteristic of the channel or large signal-to- 
noise ratio, the M.M.S.E.s obtained by the two different criteria con- 
sidered in this study are almost the same. For example, with white- 
noise spectral density 0.02 watts/Hz (S/N at the receiver input is 
18 dB) and the Fourier transform of the channel 


(1 — 0.1 al ioe 
Tv 

the M.M.S8.E.s obtained by criteria (7) and (77) are 0.02818 and 0.02826 
respectively. However, either increasing the slope or decreasing the 
signal-to-noise ratio increases the disparity of the M.M.S.E.s obtained 
by criteria (7) and (iz). Under these situations, criterion (7) is much 
preferred. For example, with the same white-noise spectral density as 
before and the Fourier transform of the channel 


(1 — 0.9 | w pteeraen, 
Tv 

the M.M.S.E.s obtained by criteria (7) and (7) are 0.121 and 0.148 

respectively. 
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APPENDIX 


Minimization of Noise Plus Intersymbol Interference (Binary and 
Partial-Response Systems) 


The details of the minimization procedure of noise-plus-intersymbol 
interference for the binary and partial-response systems will be given 
in this Appendix. 

The block diagram of a general digital data system is shown in 
Fig. 1. The characteristics of the ideal low-pass filter and the equalizer 
are assumed to be 


re = | 0<|w| So, , (24) 


0 otherwise 
and 


i] 


E@) = Cer, 


n=-— 
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With input white Gaussian noise having one-side power spectral 
density To? watts/Hz, the variance of the noise at the receiver equalizer 
output is, 


oo = E(ns) 
_ Psi 


~ OF 


= [ ¥@ 1B Paw 


[- {|R@ —w,) |? + | R@, — w) |?}*-| E@) |? do 


Yh N-1 (r/T)(2K+1) F 
= nS | Y(w) | E(w) | de 


a/T)(2K-1) 


n oe Wes) | Els) P deo + ic p(s) | Ele) Pao} 


(4/T)(2N-1) 


2 w/T 
= ie I y eq @)-| E@) |? da, (25) 
where 


Qa Nr) . (26) 


¥ eq (w) = We) + vo + 2) + or + yu + 2Ne 


(2N — 1)r = 
T 


Son S 


(2N + Vr 
T 


Similarly, 


Om 


y(t, 0) = on I. [S@ — w) Tw — w)R@ — w,e” 


+ Sw, + wT, + w)R@, + we JE@)e** dw 
= = Ve Y(o, 0—)E()e'** de 


T —(9r/T)(2N+1) : 
- | i Yo, —\E@)el** de 


N-1 (m/T)(2K+1) . 
+ > | Y(w, NE@)e*" de 
K=—-N+1 ¢( 


a/T)(2K-1) 


+ | = Y@, 0|)E@)e**' as | 


/T(2N-1) 


T a/T ; 
Hee i Y eq (w, Ewe! du, (27) 
WT do 


* R(w) is the receiver filter transfer function. 
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where 
Y eq (w, =) = Yl, 6) + Y(w +2, 0) ++ 
i (1 + a ; a) (28) 
By Parseval’s theorem we can write 
‘ T x/T 
CHRO ==[ [Yea OP|Ee fd. 9) 


Therefore the normalized M.M.S.E. given by equation (4) can be 
rewritten as 


[M.S.E.]...6, = ~ aly (30a) 


[ee fy 0c ()-| Bl) Fao + B | 


where 


T u/T 
B=] | ¥eq, 4) P| B@) [do — vila). (80) 
Since yo(4o) is fixed, minimizing [M.S8.E.],,.5, subject to the constraint 
equation (6) is equivalent to minimizing the following function, 


v= : a [civ eq &) + | Y eq @, %) |7]-| Z@) |? dw, (31) 


subject to the same constraint equation. 

The minimization problem can be solved through a straightforward 
application of the method of Lagrangian multipliers. 

Solving 


ae = mae {Re 2 [ , {for eq @) + | Y eq &, 6) |7]-| E@) |? 








+ rY eq (w, )E@)e*'*} ish = 0, (32) 
and 
T r/T ; 
C, = Re / Y eq (, 6)E wei" des, (33) 
0 


we obtain the expression for the optimum equalizer, Hy(w), at the 
sampling instant, t) , and the demodulating carrier phase, 4 , 
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{Y¥ eq @, A)e'*'*}* 


Bonn = Teh ea @) + | Y ed @, 6) FT 


ee ee eee , (34) 
rp raw ae 
oiy eq w) + | Y eq &, 8) |? 


where {X}* means complex conjugate of X. Substituting [Ho(w)].,,6. 
into equation (4), we obtain the M.M.S.E. 

[M.M.S.E.],,,0, 

_ 1 
rr | ¥ eq (@, 4%) | dee 
ai eq w) + | Y eq @, 4) |’ 

Equation (35) can be further minimized over all possible sampling 
instants and carrier phases to obtain a global minimum of mean square 
error. 

We now consider the class IV partial-response system. The transfer 
function of the equivalent baseband transmitted signal and receiver 
filter are 


S@, — w) = R@, — w) 
z Votan poe 0< | w | < 


0 otherwise. 
It follows that 


— 1. (35) 


7 —i(r/2+9)w/lol, _ < < 
¥ eq (i, 6) = 2T sin wTe Tw, — w) O<|lwo|< 


0 otherwise. 
The constraint equations are assumed to be 
w/T 
= yl) = Re= [Yea (w, 6):BG)-e"*" dio, (88a) 


and 


C—2 


y-1(0o) 
1 a -jw(T-to) 

= fee [ VerG OHO deo,  (38b) 
WT Jo 


where C is a constant. 
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We now wish to minimize the function 
1 x/T 
U = Re* | {[o22T | sin wT | 
0 


+ 47? | sin wT |?-| Te. — w) |"]-| Z@) |? 
+ AY eq @, %)EH@)e'? 7"? 
+ roV eg (w, )E we 77 '?} dw (39) 


subject to the constraint equations (38a) and (38b). The expression 
for the optimum equalizer for a given constant C, sampling instant fo , 
and carrier phases >» is 


CA Ad EPA LY cg (w, Oeterr}* 
1 2 
+ 


AL Ae A LY on, agetr ys 


MoWlew.e = FOF lsinal | + 4 anal |Te oF 


Oslolsq, (9) 


where 
_ i QT sin wT’: | T(w, — w) |? 
ie Te 0 of + 27 sin wT’: | Tw, — w) |’ ae (41a) 
and 
- 1 £7" QT sin wT'-| Tw, — w) |? -e!°?” 
Ae Re= | o, + 27 sin wI'-| T@, — w) i dio. (41b) 
It follows that the M.M.S.E. is 
(M.M.8.E.]e.t..6, 
1 x/T 
Z + | [0227 |sinwT | + 47? | sin wT |?-| Toe — «) |] 
0 
| [EoW)]c.t0.0. |? dw + 20° — 4C. (42) 


Since | Y eq (w, 60) |? is independent of f and 4 , hence | [Fo(w)]e,1..0, |” 
and [M.M.S.E.]co,,,,6, do not depend upon é and  . 
Equation (42) can be further minimized with respect to C by solving 
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, [IM.MS.E.le..0. = 0. (43) 


The optimum solution C = 1, provides the M.M.S.E. for a class IV 
partial-response system. 


9 
IMMSEjon = G—q - 2 
= min(MMSE.]c. (44) 


all ¢ 


In the absence of channel noise, o; = 0, then 


A, = 1, (45) 
and 
As = 0. (46) 
Henre, 
[M.M.S.E.]oe-, = 0. (47) 
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A Low-Noise Metal-Semiconductor- 


Metal (MSM) Microwave Oscillator 


By D. J. COLEMAN, JR., and S. M. SZE 
(Manuscript received January 22, 1971) 


I. INTRODUCTION 


Low-noise microwave CW oscillations have been obtained from 
metal-semiconductor-metal (MSM) structures made from a 10-ym 
thin slice of silicon sandwiched between two PtSi Schottky barrier 
contacts. Microwave CW power up to 50 mW has been obtained at 
5 GHz with efficiency up to 1.8 percent. The FM noise measure 1 MHz 
from the carrier is 22.8 dB which is considerably lower than that of a 
silicon avalanche oscillator. The mechanisms responsible for the micro- 
wave oscillation are (7) the exponential increase of the local carrier 
population due to injection of minority carriers at the forward-biased 
contact and (2) the transit-time delay of injected carriers traversing 
the depletion region. By optimizing material and device parameters, it 
is believed that higher efficiency and higher power microwave oscilla- 
tions can be obtained from the MSM and its related structures with 
the inherent low-noise characteristics. 


II. DEVICE FABRICATION 


Single-crystal n-type silicon wafers with 110-cm resistivity (4 x 
10** em-* doping), (111) oriented, and with a dislocation density less 
than 100/cm? were Syton polished on both sides to a final thickness 
of 10 + 2 pm. Platinum of 500 A thickness was sputtered onto both 
sides of the wafer and was sintered to form approximately 1000 A 
PtSi on both sides. Chromium of 300 A was deposited on one side; 
this was followed by a 3000 A layer of Au evaporation. The same 
depositions were then made on the other side. A standard photo- 
lithographic method was used to define circular patterns of gold dots 
with areas of 5 X 10°* cm?. The devices were separated by etching 
and were mounted onto V-type microwave packages. 
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III. DC CHARACTERISTICS 


A schematic diagram of an MSM structure is shown in Fig. la. The 
band diagram at thermal equilibrium is shown in Fig. 1b for an n-type 
semiconductor where ¢n1 and ¢n2 are the barrier heights for the two 
metal-semiconductor contacts respectively. For the PtSi-Si-PtSi struc- 
ture mentioned previously, ¢n1 = ¢ng = 0.85 eV. Figure 1c shows the 
energy band diagram when a voltage is applied. We have electron cur- 
rent from the reverse-biased contact and hole current from the 
forward-biased contact. 

The measured I-V characteristics at 300°K and 77°K of a repre- 
sentative device is shown in Fig. 2. The rapid increase in terminal 
current with applied voltage (above 30 volts) is caused by thermionic 
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Fig. 1—(a) Schematic diagram of a metal-semiconductor-metal (MSM) struc- 
ture. (b) Energy band diagram of an MSM structure in thermal equilibrium. 
(c) Energy band diagram of an MSM structure under biasing condition. 
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Fig. 2—Measured current vs voltage of a silicon MSM structure (PtSi-Si-PtSi) 
at two temperatures. The device parameters are L = 10 wm, Nn = 4 X 1014 
em-3, @m = dne = 0.85 eV, and with an area of 5 * 10-4 cm?. 


hole injection into the semiconductor as the depletion layer of the 
reverse-biased contact reaches through the entire device thickness. 
This critical voltage is approximately given by qNL?/2e, , where N is 
the doping concentration, L the semiconductor thickness, and «, the 
dielectric permittivity.1 The current increase is not due to avalanche 
multiplication as is apparent from the magnitude of the critical 
voltage and its negative temperature coefficient. At 77°K, the rapid 
increase is terminated at a current of about 10° amps. This saturated 
current is expected from the thermionic emission theory of hole injec- 
tion' from the forward-biased contact with a hole barrier height 
(dp2) of about 0.15 eV. 
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IV. MICROWAVE PERFORMANCE 


CW microwave performance of the MSM devices was measured in 
a coaxial Impatt circuit described by D. E. Iglesias.?, Microwave 
power was obtainable over the entire C band of 4-8 GHz. The maxi- 
mum power observed was 50 mW at 4.9 GHz. The maximum efficiency 
approached 1.8 percent. Figure 3 shows some of the measured micro- 
wave power versus current with frequency of operation indicated on 
each curve for three typical devices tested. The voltage indicated 
in parenthesis labeling each curve is the average bias voltage at the 
diode while oscillating. Because of the symmetry of the structure, it 
could be operated with either polarity of bias voltage, and similar 
results were obtained. 

The highest-power unit was tested for FM noise when tuned to a 
frequency of 4.88 GHz. The FM single-sideband noise measure 1 MHz 
from the carrier frequency was found to be 22.8 dB at 7 mA bias 
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_ Fig. 3—CW microwave output vs input current for three Si MSM devices. Also 
indicated are the operating frequency and the average bias voltage while 
oscillating. 
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current. This noise measure is considerably lower than that of a 
silicon Impatt diode and is comparable to that of a GaAs transfer- 
electron oscillator. 

The 6-GHz diode was used to build a stable negative conductance 
linear amplifier. A gain-bandwidth product of 200 MHz was obtained 
with 19 dB gain at 5 mA bias. The small signal noise measure was 
15 = 1 GB. 

The mechanisms responsible for the microwave oscillations are 
believed to be (7) the rapid increase of carrier injection process caused 
by the decreasing potential barrier of the forward-biased metal- 
semiconductor contact and (727) an apparent 37/2 transit angle of the 
injected carriers which traverse the semiconductor depletion region. 
For the 6-GHz diode, the thickness LZ is smaller. This results in higher 
frequency (since frequency is inversely proportional to L) and lower 
critical voltage (which is proportional to L?). Since the main noise 
source for thermionic emission processes is the shot noise, one would 
expect a low noise measure. This is indeed observed experimentally. 

If a large barrier height can be obtained for a p-type semiconductor, 
one can make a complementary MSM structure in the same way as 
described here. Since the reverse-biased metal-semiconductor contact 
serves mainly as a blocking contact until the reach-through voltage 
is obtained, it is conceivable that this contact can be replaced by a p-n 
junction such as p*-n-metal structure. By optimizing the material 
parameters (such as doping profile, barrier heights, and semiconductor 
thickness) and device geometry and topology,’ it is believed that 
higher efficiency and higher power microwave oscillations can be ob- 
tained from the MSM and its related structures with the inherent 
low-noise characteristics. 
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